Kafka高低版本的心跳(heartbeats)和會話(session)超時機制

Kafka高低版本的心跳(heartbeats)和會話(session)超時機制

Kafka 0.10.0.0 & Kafka 0.10.1.0

0.10.0.0 的心跳和超時機制:
心跳(heartbeats)與輪詢(poll)是耦合在一起的,只提供了 session.timeout.ms 參數,沒有獨立的的控制 poll 輪詢的參數。
假設消費者處理消息需要1分鐘,則需要將 session.timeout.ms 設置大於1分鐘纔行,否則消費者會超時。

session.timeout.ms
The timeout used to detect failures when using Kafka's group management facilities. 
When a consumer's heartbeat is not received within the session timeout, the broker will mark the consumer as failed and rebalance the group. 
Since heartbeats are sent only when poll() is invoked, a higher session timeout allows more time for message processing in the consumer's poll loop at the cost of a longer time to detect hard failures. 
See also max.poll.records for another option to control the processing time in the poll loop. 
Note that the value must be in the allowable range as configured in the broker configuration by group.min.session.timeout.ms and group.max.session.timeout.ms.
官方提到還可以通過 max.poll.records 參數從另外一個維度來控制影響每次 poll 的時間。

heartbeat.interval.ms
The expected time between heartbeats to the consumer coordinator when using Kafka's group management facilities. 
Heartbeats are used to ensure that the consumer's session stays active and to facilitate rebalancing when new consumers join or leave the group. 
The value must be set lower than session.timeout.ms, but typically should be set no higher than 1/3 of that value. 
It can be adjusted even lower to control the expected time for normal rebalances.


0.10.1.0 的心跳機制:
從該版本開始,heartbeats 與 poll 解耦,每個線程有獨立的心跳維護機制。
從該版本開始新增了獨立的 max.poll.interval.ms 參數。這樣可以單獨配置兩次 poll 輪訓的間隔時間,這就使得可以配置 poll 輪訓間隔時間大於 heartbeats 心跳間隔,即消費者處理消息的時間可以獨立配置,允許消息處理時間大於心跳時間(會話超時時間 session.timeout.ms)。
session.timeout.ms 用於心跳維護線程,max.poll.interval.ms 用於消費處理線程。該版本存在兩個獨立的線程。

假設 session.timeout.ms = 30000,即30秒,則消費者心跳線程必須在此超時之前向服務端發送心跳。
另一方面,如果單個消息處理需要1分鐘,則可以將 max.poll.interval.ms 設置大於1分鐘,以便爲消費處理線程提供更多的時間來處理消息。
否則,如果 max.poll.interval.ms < 1分鐘,會導致單個消息處理完、等下次 poll 的時候,因爲兩次 poll 超出了 max.poll.interval.ms 而導致 poll 失敗(即使 session 未超時,poll 還是會失敗)。

如果處理(poll)線程掛掉,服務端可以通過 max.poll.interval.ms 來檢測到。
如果整個消費者(Consumer)掛掉,則只能通過 session.timeout.ms 來檢測到。


0.10.1.0 的重大修改:
The new Java Consumer now supports heartbeating from a background thread. 
There is a new configuration max.poll.interval.ms which controls the maximum time between poll invocations before the consumer will proactively leave the group (5 minutes by default). 
The value of the configuration request.timeout.ms must always be larger than max.poll.interval.ms because this is the maximum time that a JoinGroup request can block on the server while the consumer is rebalancing, so we have changed its default value to just above 5 minutes. 
Finally, the default value of session.timeout.ms has been adjusted down to 10 seconds, and the default value of max.poll.records has been changed to 500.


0.10.1.0 版本的官方說明(http://kafka.apache.org/0101/documentation.html
max.poll.interval.ms
The maximum delay between invocations of poll() when using consumer group management. This places an upper bound on the amount of time that the consumer can be idle before fetching more records. If poll() is not called before expiration of this timeout, then the consumer is considered failed and the group will rebalance in order to reassign the partitions to another member.

session.timeout.ms
The timeout used to detect consumer failures when using Kafka's group management facility. The consumer sends periodic heartbeats to indicate its liveness to the broker. If no heartbeats are received by the broker before the expiration of this session timeout, then the broker will remove this consumer from the group and initiate a rebalance. Note that the value must be in the allowable range as configured in the broker configuration by group.min.session.timeout.ms and group.max.session.timeout.ms.

request.timeout.ms
The configuration controls the maximum amount of time the client will wait for the response of a request. 
If the response is not received before the timeout elapses the client will resend the request if necessary or fail the request if retries are exhausted.
 

另外注意:

0.10.2.1 版本的默認參數(max.poll.interval.ms)調整爲Integer.MAX_VALUE

http://kafka.apache.org/0102/documentation.html

Notable changes in 0.10.2.1
The default values for two configurations of the StreamsConfig class were changed to improve the resiliency of Kafka Streams applications. 
The internal Kafka Streams producer retries default value was changed from 0 to 10. 
The internal Kafka Streams consumer max.poll.interval.ms default value was changed from 300000 to Integer.MAX_VALUE.

 

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章