Trident state


Trident是一個很好的關於讀取或寫入有狀態資源的抽象。state可以是內部的topology(例如內存中和被HDFS支持的),或者是外部的存儲在如memcache或cassandra的數據庫中。這些情況在使用Trident API時,是沒有區別的。

Trident使用一種容錯機制來管理state,使得state的更新獨立於資源的失敗和重新發送。這也Trident topologies好像對每個消息只做一次處理的原因。





1 使用小的批處理來處理tuple。

2 每個tuple的批處理都有唯一的 "transaction id" (txid)。如果這個批處理重新執行,將會準確的給出一樣的txid。

3 State更新是通過有序的批處理完成的。也就是,批處理3對應的狀態更新一定會在批處理2更新成功之後再去執行。


Transactional spouts

記住,Trident處理tuple是以一個個擁有唯一transaction id的小型批處理來實現的。不同屬性的spout爲batch提供不同的保證。事務型spout擁有以下屬性:

1 批處理的txid會一直保持相同。重新執行批處理的txid會與第一次執行該批處理時的txid完全相同。

2 tuple在批處理之間是不會有重疊或交集的。一個tuple只會在一個batch中。

3 所有tuple都在batch中。沒有一個tuples例外。

這是一個非常簡單易懂的spout類型。數據流被分成不會改變的批處理。trident-kafka 有一個事務型spout用於Kafka。


the way TransactionalTridentKafkaSpout works is the batch for a txid will contain tuples from all the Kafka partitions for a topic. Once a batch has been emitted, any time that batch is re-emitted in the future the exact same set of tuples must be emitted to meet the semantics of transactional spouts. Now suppose a batch is emitted from TransactionalTridentKafkaSpout, the batch fails to process, and at the same time one of the Kafka nodes goes down. You're now incapable of replaying the same batch as you did before (since the node is down and some partitions for the topic are not unavailable), and processing will halt.



在我們開始瞭解不透明事務型spout之前。我們先來看下如何設計一個擁有一次性語義的事務型spouts的state實現。state類型被稱作事務狀態 "transactional state",並且利用了任意給定的tuples相關的txid不會改變這一特性。

假設準備進行一個word count 計算,並且你希望將結果保存在一個 key/value的數據庫。KEY是word,value是語句中出現的數量。我們已經看到如果只保存count數將無法確定一個批處理是否完成。因此,我們應該存儲 將count和transaction id作爲一個原子數值存儲起來。當我們更新count的時候,比較數據庫中的txid和當前batch的txid。如果他們相同,根據強順序執行性,我們可以跳過這次更新。如果他們不同,則增加count的值。這個邏輯是可行的,因爲batch的txid是永遠不會改變的,Trident保證state的更新是完全按照batch的順序來完成的。

Consider this example of why it works. Suppose you are processing txid 3 which consists of the following batch of tuples:




man => [count=3, txid=1]
dog => [count=4, txid=3]
apple => [count=10, txid=2]

"man"相關的 txid 是 txid 1. 因爲當前的txid 爲3,你可以確定這個batch沒有被加入到這個count中。 所以你可以繼續執行將man的count數加上2並且更新 txid 爲3. 看另一個key/value對, "dog" 的txid是3,與當前的txid相同. 你可以確定這個batch已經被執行,所以你可以跳過此次更新。完成更新後,數據庫中的內容如下:

man => [count=5, txid=3]
dog => [count=4, txid=3]
apple => [count=10, txid=2]

Opaque transactional spouts

正如之前的描述,不透明事務spout不能保證tuple的batch txid一直保持不變。不透明事務spout有以下的屬性:


OpaqueTridentKafkaSpout is a spout that has this property and is fault-tolerant to losing Kafka nodes. Whenever it's time for OpaqueTridentKafkaSpout to emit a batch, it emits tuples starting from where the last batch finished emitting. This ensures that no tuple is ever skipped or successfully processed by multiple batches.


What you can do is store more state in the database. Rather than store a value and transaction id in the database, you instead store a value, transaction id, and the previous value in the database. Let's again use the example of storing a count in the database. Suppose the partial count for your batch is "2" and it's time to apply a state update. Suppose the value in the database looks like this:

{ value = 4,
  prevValue = 1,
  txid = 2

Suppose your current txid is 3, different than what's in the database. In this case, you set "prevValue" equal to "value", increment "value" by your partial count, and update the txid. The new database value will look like this:

{ value = 6,
  prevValue = 4,
  txid = 3

Now suppose your current txid is 2, equal to what's in the database. Now you know that the "value" in the database contains an update from a previous batch for your current txid, but that batch may have been different so you have to ignore it. What you do in this case is increment "prevValue" by your partial count to compute the new "value". You then set the value in the database to this:

{ value = 3,
  prevValue = 1,
  txid = 2

This works because of the strong ordering of batches provided by Trident. Once Trident moves onto a new batch for state updates, it will never go back to a previous batch. And since opaque transactional spouts guarantee no overlap between batches – that each tuple is successfully processed by one batch – you can safely update based on the previous value.

