CRF++ Source code reading experience

原創

2018-08-24 23:24

讀了CRF++源碼，總結如下幾點

1，實現的是linear-chain結構

2，感覺對樣例的表示方式不如maxent靈活，可以看看suit的實現

3，TaggerImpl存儲訓練樣例，x存儲相應的output序列，result存儲相應的狀態序列，answer存儲模型算出來的狀態序列；爲了實現多線程併發處理，另外存儲了處理該TaggerImpl的線程thread_id_；output序列中的每一個token都對應一個feature集合，整個output序列對應了feature集合的序列，系統將所有訓練樣例的feature集合順序存儲在一個feature_cache中，因此在每一個TaggerImpl中保存了自己的feature序列在feature_cache中偏移量feature_id_，而這個feature_cache存在於FeatureIndex對象中。系統中所有的TaggerImpl都共享一個FeatureIndex對象；爲了DP編程的方便，又包含一個Node二維數組，橫軸對應output中的每一個token，縱軸代表系統狀態集合中的每一個狀態。

4，Node存儲DP中的每一個狀態，包括alpha，beta，verterbi路徑前驅等。

5，與maxent不同的是在buildfeature的時候，系統會爲每一個<obeservation,state>狀態創建feature，這裏的state包括unigram和bigram特徵，而不管他們是否在訓練數據中出現過。

6，所有的observation從training data中提取，然後存儲在feature_cache的dict中，這個dict的結構是observation->pair<observationId,count>，最後observationId的最大值就是模型所有參數的個數。

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

相關文章

甲骨文成立以來最大手筆：300億美元收購 Cerner 進軍醫療保健領域

{"type":"doc","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"typ

2021-12-21 10:54:01

專訪融雲 VP 岑裕：複雜場景下，如何解放開發者？

{"type":"doc","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"typ

2021-12-01 18:43:50

Twitter CEO 離職搞比特幣？CTO 成繼任者

{"type":"doc","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"typ

2021-12-01 10:03:50

InfoQ搬新家啦，來留下你的祝福吧！

{"type":"doc","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"typ

极客邦科技

2021-11-18 17:58:58

TDengine在浙商銀行微服務監控中的實踐

{"type":"doc","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"typ

2021-11-12 16:03:55

微服務鏈路追蹤組件Skywalking實戰

{"type":"doc","content":[{"type":"heading","attrs":{"align":null,"level":1},"content":[{"type":"text","text":"1. skywalk

2021-10-20 20:03:57

2021雲棲大會 | 傳統行業如何上鍊？旺鏈科技與你面對面暢聊！

{"type":"doc","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"typ

2021-10-19 14:13:52

國內首家100G雲服務器產品家族，正式規模應用！

{"type":"doc","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"typ

2021-10-19 12:13:55

Kafka 生產環境部署指南

{"type":"doc","content":[{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"1 Kafka 基本

2021-10-18 14:23:54

技術人在職場如何擺放的心態

{"type":"doc","content":[{"type":"blockquote","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null

2021-10-18 13:24:05

我用 10000 張圖片合成我們美好的瞬間

{"type":"doc","content":[{"type":"heading","attrs":{"align":null,"level":1}},{"type":"blockquote","content":[{"type":"pa

2021-10-16 20:33:52

獨一無二的「MySQL調優金字塔」相信也許你擁有了它，你就很可能擁有了全世界。

{"type":"doc","content":[{"type":"heading","attrs":{"align":null,"level":1},"content":[{"type":"text","text":"開發俏皮話","at

2021-10-14 11:03:55

團隊管理之如何成爲核心員工

{"type":"doc","content":[{"type":"heading","attrs":{"align":null,"level":1},"content":[{"type":"text","text":"核心員工的三個階段"

小诚信驿站

2021-10-13 14:03:53

談 C++17 裏的 State 模式之二

{"type":"doc","content":[{"type":"blockquote","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null

2021-10-12 21:03:51

怎麼給程序員做職業規劃？

{"type":"doc","content":[{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"程序員的中年危機",

2021-10-12 13:03:52

24小時熱門文章

Spring Cloud 部署時如何使用 Kubernetes 作爲註冊中心和配置中心

最新文章

最新評論文章