一、問題描述
flume監控日誌,同時,發送到flume監控的avro端口,當大量數據1000萬條數據適合,flume監控日誌的報錯。然後,查看flume監控界面,發現flume監控界面消費突然消失。查看監控avro端口的agent的日誌報錯
Avro source avro_source: Unable to process event batch. Exception follows.
org.apache.flume.ChannelFullException: Space for commit to queue couldn't be acquired. Sinks are likely not keeping up with sources, or the buffer size is too tight
at org.apache.flume.channel.MemoryChannel$MemoryTransaction.doCommit(MemoryChannel.java:128)
at org.apache.flume.channel.BasicTransactionSemantics.commit(BasicTransactionSemantics.java:151)
at org.apache.flume.channel.ChannelProcessor.processEventBatch(ChannelProcessor.java:194)
at org.apache.flume.source.AvroSource.appendBatch(AvroSource.java:402)
at sun.reflect.GeneratedMethodAccessor35.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.avro.ipc.specific.SpecificResponder.respond(SpecificResponder.java:91)
at org.apache.avro.ipc.Responder.respond(Responder.java:151)
at org.apache.avro.ipc.NettyServer$NettyServerAvroHandler.messageReceived(NettyServer.java:188)
at org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
at org.apache.avro.ipc.NettyServer$NettyServerAvroHandler.handleUpstream(NettyServer.java:173)
at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
at org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791)
at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:296)
at org.jboss.netty.handler.codec.frame.FrameDecoder.unfoldAndFireMessageReceived(FrameDecoder.java:462)
at org.jboss.netty.handler.codec.frame.FrameDecoder.callDecode(FrameDecoder.java:443)
at org.jboss.netty.handler.codec.frame.FrameDecoder.messageReceived(FrameDecoder.java:303)
at org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:559)
at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:268)
at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:255)
at org.jboss.netty.channel.socket.nio.NioWorker.read(NioWorker.java:88)
at org.jboss.netty.channel.socket.nio.AbstractNioWorker.process(AbstractNioWorker.java:108)
at org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:318)
at org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:89)
at org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178)
at org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
at org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
二、問題原因
日誌信息報錯,很清楚,是由於Channel的大小報錯的,容量不足:
org.apache.flume.ChannelFullException: Space for commit to queue couldn't be acquired. Sinks are likely not keeping up with sources, or the buffer size is too tight
at org.apache.flume.channel.MemoryChannel$MemoryTransaction.doCommit(MemoryChannel.java:128)
而導致容量不足的原因如下:memory chanell的結構如下
因爲“source往putList放數據,然後提交到queue中”與“sink從channel中取數據到sink和takeList,然後再從putList取數據到queue中”這兩部分是分開來,任他們自由搶鎖,所以,當前者多次搶到鎖,後者沒有搶到鎖,同時queue的大小又太小,撐不住多次往裏放數據,就會導致觸發這個異常。
解決這個問題最直接的辦法就是增大queue的大小,增大capacity和transacCapacity之間的差距,queue能撐住多次往裏面放數據即可。
三、解決辦法
增加agent的容量
avro_memory_kafka.channels.memory_channel.type = memory
修改爲
avro_memory_kafka.channels.memory_channel.type = memory
avro_memory_kafka.channels.memory_channel.keep-alive = 60
avro_memory_kafka.channels.memory_channel.transactionCapacity = 1000
avro_memory_kafka.channels.memory_channel.capacity = 1000000
其中
type |
- |
組件類型名稱必須是memory |
capacity |
100 |
存儲在 Channel 當中的最大 events 數 |
transactionCapacity |
100 |
同時刻從Source 獲取,或發送到 Sink 的最大 events 數 |
keep-alive |
3 |
添加或刪除一個 event 超時的秒數 |
同時,失敗後,flume會暫停source向channel放數據,等待幾秒鐘,這期間sink應該會消費channel中的數據,當source再次開始想channel放數據時channel就有足夠的空間了。
四、參考
1.https://blog.csdn.net/gaopu12345/article/details/77922924
2.https://www.cnblogs.com/justinyang/p/8675414.html