架構
上圖爲http://www.cnblogs.com/delgyd/p/elk.html#3656833 中的圖
架構解讀 : (整個架構從左到右,總共分爲5層)(本文將第三層以下的進行了合併,無elasticsearch集羣)
第一層、數據採集層
最左邊的是業務服務器集羣,上面安裝了filebeat做日誌採集,同時把採集的日誌分別發送給兩個logstash服務。
第二層、數據處理層,數據緩存層
logstash服務把接受到的日誌經過格式處理,轉存到本地的kafka broker+zookeeper 集羣中。
第三層、數據轉發層
這個單獨的Logstash節點會實時去kafka broker集羣拉數據,轉發至ES DataNode。
第四層、數據持久化存儲
ES DataNode 會把收到的數據,寫磁盤,建索引庫。
第五層、數據檢索,數據展示
ES Master + Kibana 主要 協調 ES集羣,處理數據檢索請求,數據展示。
Jdk 1.8及以上版本
Filebeat
版本
filebeat-5.5.2-1.x86_64
配置信息
#vim filebeat.yml
filebeat.modules:
filebeat.prospectors:
- input_type: log
paths: #定義讀取log的路徑,此處爲每個項目一個路徑,可以寫多個或者用* 匹配
- /usr/local/nginx1.6/logs/sso.so.duia.com.log
include_lines: [ ]
multiline: #合併多行,下一行不是[ 開頭,合併到上一行
pattern: '^\['
negate: true
match: after
document_type: sso-so #定義type,提供給logstash 引用,並最終定義elasticsearch 索引
tail_files: true
output.kafka: #輸出到kafka中
enabled: true
hosts: ["172.16.101.76:9092"]
topic: nginx #定義消費隊列,如果多個logstash消費,需要定義Parttion
compression: Snappy
max_message_bytes: 1000000
啓動
nohup /usr/local/filebeat/filebeat -e -c /usr/local/filebeat/logs.yml -d "publish" &>> /data/logs/filebeat.log &
Zookeeper
版本
zookeeper-3.4.9.tar.gz
配置信息
#vim zoo.cfg
tickTime=2000
initLimit=10
syncLimit=5
dataDir=/data/zookeeper #定義數據存放位置
clientPort=2181
server.1=172.16.101.76:12888:13888
server.2=172.16.101.175:12888:13888
server.3=172.16.101.172:12888:13888
cat /data/zookeeper/myid
1
Zookeeper其他節點請參考此配置文件,只有myid不同。
啓動
/usr/local/elk/zookeeper/bin/zkServer.sh start
Kafka
版本
kafka_2.12-0.10.2.0.tgz
配置信息
# vim server.properties
broker.id=1
port = 9092
host.name = 172.16.101.76 #監控地址
num.network.threads=3
num.io.threads=8
socket.send.buffer.bytes=102400
socket.receive.buffer.bytes=102400
socket.request.max.bytes=104857600
log.dirs=/data/logs/kafka #log文件存放位置
num.partitions=1
num.recovery.threads.per.data.dir=1
log.retention.hours=168
log.segment.bytes=1073741824
log.retention.check.interval.ms=300000
zookeeper.connect=127.0.0.1:2181 #連接zookeeper地址
zookeeper.connection.timeout.ms=6000
啓動
bin/kafka-server-start.sh config/server.properties &
Logstash
版本
logstash-5.2.2.tar.gz
配置信息
input {
kafka {
bootstrap_servers => "172.16.101.76:9092"
topics => ["nginx"]
codec => "json"
decorate_events => true
}
}
input {
kafka {
bootstrap_servers => "172.16.101.76:9092"
topics => ["tomcat"]
codec => "json"
decorate_events => true
}
}
filter {
#nginx
if [type] == "nginx-access.log” {
grok {
match => {
"message" => "\[%{HTTPDATE:timestamp}\] %{IPV4:client_ip} \"%{USER:forward}\" %{USER:user} %{IPORHOST:host} \"%{WORD:method} %{URIPATHPARAM:valume} %{URIPROTO:http}/%{NUMBER:http_version}\" %{QS:request_body} %{NUMBER:status:int} \"(?:%{IPORHOST:urlname} %{POSINT:urlport})\" %{NUMBER:request_time} %{IPV4:upstream_host}:%{NUMBER:upstream_port} %{NUMBER:reponse_time} (?:%{NUMBER:bytes}|-) %{QS:referrer} %{QS:agent}"
}
remove_field => ["message"]
}
geoip {
source => "client_ip"
target => "geoip"
database => "/data/GeoIP/GeoLite2-City.mmdb"
add_field => ["location" , "%{[geoip][latitude]}, %{[geoip][longitude]}"]
}
date {
match => ["timestamp" , "dd/MMM/YYYY:HH:mm:ss Z"]
target => "@timestamp"
remove_field => ["timestamp"]
}
}
if [type] == "catalina.out" {
grok {
match => {
"message" => "%{COMMONAPACHELOG}"
}
remove_field => ["message"]
}
}
}
output {
if "_grokparsefilure" in [tags] {
file {
path => "data/logs/grokparsefailure-%{[type]}-%{+YYYY.MM}.log"
}
}
elasticsearch {
hosts => ["172.16.101.76:9200"]
index => "%{type}-%{+YYYY.MM.dd}"
template_overwrite => true
}
}
啓動
/usr/local/elk/logstash/bin/logstash -f /usr/local/elk/logstash/config/logs.yml &
Elasticsearch
版本
elasticsearch-5.2.2.tar.gz
配置信息
[root@host76 config]# grep -vE "^$|^#" elasticsearch.yml
cluster.name: Mo
node.name: node01
node.attr.rack: r1
path.data: /data/elasticsearch
path.logs: /data/logs/elasticsearch
bootstrap.memory_lock: false
network.host: 172.16.101.76
http.port: 9200
discovery.zen.ping.unicast.hosts: ["172.16.101.76","172.16.101.172"]
discovery.zen.minimum_master_nodes: 1
gateway.recover_after_nodes: 1
action.destructive_requires_name: true
bootstrap.system_call_filter: false
thread_pool.index.queue_size: 500
thread_pool.bulk.queue_size: 1000
indices.recovery.max_bytes_per_sec: 100mb
http.cors.enabled: true
http.cors.allow-origin: "*"
[root@host76 config]# grep -vE "^$|^#" jvm.options
-Xms6g
-Xmx6g
-XX:+UseConcMarkSweepGC
-XX:CMSInitiatingOccupancyFraction=75
-XX:+UseCMSInitiatingOccupancyOnly
-XX:+DisableExplicitGC
-XX:+AlwaysPreTouch
-server
-Xss1m
-Djava.awt.headless=true
-Dfile.encoding=UTF-8
-Djna.nosys=true
-Djdk.io.permissionsUseCanonicalPath=true
-Dio.netty.noUnsafe=true
-Dio.netty.noKeySetOptimization=true
-Dio.netty.recycler.maxCapacityPerThread=0
-Dlog4j.shutdownHookEnabled=false
-Dlog4j2.disable.jmx=true
-Dlog4j.skipJansi=true
-XX:+HeapDumpOnOutOfMemoryError
配置文件修改參考官方文檔
https://elasticsearch.cn/book/elasticsearch_definitive_guide_2.x/dont-touch-these-settings.html
啓動
bin/elasticsearch -d
Kibana
版本
kibana-5.2.2-linux-x86_64.tar.gz
配置信息
[root@host76 config]# grep -vE "^$|^#" kibana.yml
server.port: 5601
server.host: "172.16.101.76"
elasticsearch.url: "http://172.16.101.76:9200"
elasticsearch.pingTimeout: 1500
elasticsearch.requestTimeout: 30000
elasticsearch.requestHeadersWhitelist: [ authorization ]
pid.file: /usr/local/kibana/kibana.pid
logging.dest: /data/logs/kibana/kibana.log
啓動
bin/kibana &
Nginx
配置信息
upstream kibana {
server 172.16.101.76:5601 max_fails=3 fail_timeout=30s;
}
server {
listen 8080;
server_name localhost;
location / {
proxy_pass http://kibana/;
index index.html index.htm;
#auth
#auth_basic "kibana Private";
#auth_basic_user_file /etc/nginx/.htpasswd;
}
}
註釋
Logstash爲讀取kafka中的數據,並且將數據通過grok中的正則進行格式化,輸出到elasticsearch中。
遇到的問題:
1、grok未生效,logstash調試
output {
stdout {
codec => rubydebug
}
}
使用debug模式,輸出的內容不是grok格式好的json串,經過摸索,input 中定義
Codec => “json” 輸出後爲grok格式化的json格式。
2、同一種log_format定義的log信息不匹配
Grok 不需要特別的匹配到字符串格式,同時,不需要的信息,可以不進行匹配。
3、grok調試
http://grokdebug.herokuapp.com/?#
此網站在線調試,開始時需使用代理
4、logstash中output 可以動態定義索引,同時也可以指定固定索引
5、logstash 可以使用if 判斷 定義input 中的數據源 和 output中的 輸出及index
Elasticsearch
查看節點
curl '172.16.101.76:9200/_cat/nodes?v'
查看健康狀況
curl '172.16.101.76:9200/_cat/health?v'
清理緩存
curl http://127.0.0.1:9200/logstash-*/_cache/clear
查看索引
curl -s 'http://172.16.101.76:9200/_cat/indices?v'
查看elasticsearch線程情況
curl -XGET http://xxxx:9200/_nodes/stats/thread_pool?pretty
清理索引
curl -XDELETE 'http://172.16.101.76:9200/*'
批量清理指定日期的索引
#curl -s 'http://172.16.101.76:9200/_cat/indices?v' | sort | awk '{print $3}' > del_index.txt
#for i in `grep 2017.12.22 del_index.txt` ;do curl -XDELETE "http://172.16.101.76:9200/${i}" && sleep 10 ;done
查看elasticsearch 所有模版
curl -XGET localhost:9200/_template | python -m json.tool
查看索引的mapping
curl -XGET http://127.0.0.1:9200/*/_mapping/
刪除elasticsearch 索引模版
curl -XDELETE localhost:9200/_template/*
添加自定義模版
curl -XPUT localhost:9200/_template/nginx -d@template.json
Template.json
{
"aliases": {},
"mappings": {
"_default_": {
"_all": {
"enabled": true,
"norms": false
},
"dynamic_templates": [
{
"message_field": {
"mapping": {
"norms": false,
"type": "text"
},
"match_mapping_type": "string",
"path_match": "message"
}
},
{
"string_fields": {
"mapping": {
"fields": {
"keyword": {
"type": "keyword"
}
},
"norms": false,
"type": "text"
},
"match": "*",
"match_mapping_type": "string"
}
}
],
"properties": {
"@timestamp": {
"include_in_all": false,
"type": "date"
},
"@version": {
"include_in_all": false,
"type": "keyword"
},
"geoip": {
"dynamic": true,
"properties": {
"ip": {
"type": "ip"
},
"latitude": {
"type": "half_float"
},
"location": {
"type": "geo_point"
},
"longitude": {
"type": "half_float"
}
}
},
"request_body": {
"ignore_above": 32766,
"index": "no",
"type": "keyword"
}
}
}
},
"order": 0,
"settings": {
"index": {
"refresh_interval": "5s"
}
},
"template": "nginx-*",
"version": 50001
}
#因爲抽取了nginx 日誌,在地圖上不顯示地區,發現elasticsearch中的模版, geoip總location字段類型不爲 geo_point, 沒有使用默認模版,通過修改默認模版名稱,符合nginx的索引規則,然後清理了索引及歷史記錄,同時重建索引。地圖上成功顯示成功。
參考鏈接:
http://www.cnblogs.com/delgyd/p/elk.html#3656833
https://elasticsearch.cn/book/elasticsearch_definitive_guide_2.x/dont-touch-these-settings.html
http://blog.csdn.net/zhaoyangjian724/article/details/52337402