確保數據監控解決方案有效的十個步驟

{"type":"doc","content":[{"type":"blockquote","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在本文中,我們將介紹十個可行的步驟,以減少假陽性和假陰性的警報,以及減輕出現誤報時的的影響。"}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"觸發或未觸發數據警報,無非以下四種結果。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/wechat\/images\/b4\/b4f3a6e01b629a134d359394896520ea.png","alt":null,"title":null,"style":null,"href":null,"fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"理想情況下,收到的第個警報都應關乎於你關心的真正的數據質量問題 (真陽性)。如果沒有你關心的問題,就不應發出警告 (真陰性)。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"然而在現實世界中,大多數數據質量監控解決方案遠遠沒有這麼完美。它們會發送一些無效的警報 (誤報)。這些問題分散了數據團隊的注意力,削弱了對監控解決方案的信心。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"亦或,監控工具遺漏了真實的數據質量問題 (假陰性)。這樣會對你的業務決策和數據產品造成損害,對數據的可信度產生質疑。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在本文中,我們將介紹十個可行的步驟,以減少假陽性和假陰性的警報,以及減輕出現誤報時的的影響。"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"1 使用動態數據測試策略"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"大多數數據測試策略都是從簡單的規則開始的,例如:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"x 列永不爲空"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"y 表的行數在 1,000,000 到 2,000,000 之間"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"如果你希望確切地瞭解運行數據,這些規則可完美契合。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"同時,它們也有幾個缺點:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"任何違反規則的行爲,無論程度大小,都會產生警報。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"需要由數據主題專家花費時間來設立這些規則。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"隨着時間的推移,它們可能需要經常維護,因爲你的數據已經產生了變化。"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"通過使用"},{"type":"link","attrs":{"href":"https:\/\/medium.com\/anomalo-hq\/dynamic-data-testing-f831435dba90","title":"xxx","type":null},"content":[{"type":"text","text":"動態數據測試策略"}]},{"type":"text","text":",可以減少誤報和誤報。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/wechat\/images\/4c\/4c1e1e8f8d8e4c7b57d1638d24379498.png","alt":null,"title":null,"style":null,"href":null,"fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":"center","origin":null},"content":[{"type":"text","text":"這是一種可預測的範圍檢測,它利用了時間序列模型,在無需任何手動配置或維護的情況下,有效地識別爲空百分比的峯值。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"動態檢測使用時間序列模型 (或其他機器學習技術) 去適應你的數據,並只在突然產生有意義的變化時發出警報。這樣的檢測在設置和增加測試覆蓋率上的工作量投入更少,同時減少了由於配置失誤或隨着時間的推移而導致的誤報。"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"2 默認情況下只檢查最新數據"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"默認情況下,你的平臺應該只檢查表中最近的數據。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/wechat\/images\/47\/47ac8cd4345411f74990e49a5ee7de74.png","alt":null,"title":null,"style":null,"href":null,"fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":"center","origin":null},"content":[{"type":"text","text":"應該允許用戶可以輕鬆關閉是否檢查最新數據這一默認選項。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"只檢查最新數據可以節省數據倉庫的成本,並可減少源自歷史數據的誤報,這些歷史數據往往是不需要再修復的。針對那些不僅僅是追加數據的表,用戶應該很容易禁用此功能。還可以讓檢查跟蹤其運行歷史,僅在遇到表中出現新問題時發送通知。"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"3 支持無代碼配置變更"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"數據質量規則難免總會出現一些假陽性警報。在這些情況下,用戶應該能夠輕鬆地調整他們的檢查。如果用戶必須編輯代碼或更改複雜的 YAML 配置文件,他們將會產生一些牴觸。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"用戶經常會做以下幾類變更:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"擴大數據結果的預期區間"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"使用 where 條件 SQL 子句縮小規則的範圍"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"等待已更新的數據到達之後再應用規則"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"改變機器學習警報的閾值"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/wechat\/images\/33\/3304103f44c05ff3127937df68071488.png","alt":null,"title":null,"style":null,"href":null,"fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":"center","origin":null},"content":[{"type":"text","text":"調整關鍵指標或數據驗證規則的高級選項,可降低假陽性和假陰性警報的風險。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"進行變更的 UI 應可一鍵避免警報。它應該易於理解並有充分的文檔。最後,應該具有變更的審計跟蹤,以便在需要的時候進行簡單的回溯。"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"4 爲數據質量規則制定優先級"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"並不是所有的數據質量規則都同等重要。在某些情況下,用戶可能正在試用這個平臺,並不收到警報。在其他情況下,規則可能就非常重要了,任何偏離預期行爲的行爲都應該發出尖銳的警報。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/wechat\/images\/42\/42585e0b7601515da5659031d0e79e08.png","alt":null,"title":null,"style":null,"href":null,"fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"除了更改警報行爲外,優先級級別還可以根據失敗警報的嚴重程度更改儀表板中警報或表格的顯示方式。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/wechat\/images\/a0\/a0cbe9fa059546d2bc75842a29b38391.png","alt":null,"title":null,"style":null,"href":null,"fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"第一個表格中有兩個失敗警報——其中一個是高優先級。第二個表格中有一個失敗警報。而第三和第四個表格中有低優先級的警報,第五個表沒有任何問題。"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"5 在流水線中使用 API 去運行高優先級規則"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"如果你非常確信某些數據驗證發現的任何問題都是真實存在的,且會產生嚴重不良後果,那麼就有必要在流水線中運行這些警報。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/wechat\/images\/dd\/dd77014308775b6889acda7d031a0312.png","alt":null,"title":null,"style":null,"href":null,"fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":"center","origin":null},"content":[{"type":"text","text":"示例:如何在管道中運行數據質量檢查,以隔離和避免發佈壞數據。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"例如,在 Apache Airflow 中,你可以使用 API 對轉換後的數據執行數據質量檢查,然後輪詢檢查結果,若沒有失敗就發佈數據。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"如果檢查失敗了,你可以運行自動任務來修復這些壞數據,中止 DAG 的其餘部分 (有時,沒有數據比壞數據更好),或使用 API 中生成的 SQL 隔離壞記錄,以備分別查詢好數據和壞數據。"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"6 將類似的問題聚集到單個警報中"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"數據質量問題經常會同時影響多個列或段的數據。如果這些情況影響到相同的數據行,則應該將它們關聯到一個警報中。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/wechat\/images\/9a\/9ab4323aad542da9dba06a8c0f2c8b8d.png","alt":null,"title":null,"style":null,"href":null,"fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":"center","origin":null},"content":[{"type":"text","text":"在同一組記錄中,有三列增加了 NULL 值,因此在此警報中聚到了一起。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在上面的 (打碼處) 警報中,其實共有 88 列異常增加了 NULL 值。把它們聚集起來減少了團隊必須查看的警報數量,並有助於識別底層問題。"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"7 掃描原始數據行的樣本,以發現任何意外的變化"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"對於許多重要的源表 (每個表包含數百個數據列),爲每個源表和列手動指定和管理數據質量規則是不現實的。反之,可以使用"},{"type":"link","attrs":{"href":"https:\/\/medium.com\/anomalo-hq\/unsupervised-data-monitoring-36cb2304c61e","title":"xxx","type":null},"content":[{"type":"text","text":"無監督數據監視"}]},{"type":"text","text":"來掃描源表中的隨機樣本行,以發現顯著異常。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/wechat\/images\/94\/945b48dae579c763517cafca778f7056.png","alt":null,"title":null,"style":null,"href":null,"fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":"center","origin":null},"content":[{"type":"text","text":"上圖是 BigQuery 公共 COVID 數據集中表異常的時間序列視圖。縱軸爲表格的列,橫軸爲時間。圓圈的大小代表異常的強度。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"可以定期檢查如上所述的概要信息,以快速識別未來需要明確處理和監控的意外和相關變化。"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"8 將通知傳遞給具有所有權和責任的團隊"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"許多公司一開始都是將所有數據質量警報發送到 Slack 或微軟團隊中的一個頻道。然而,該頻道的用戶將不得不忽略許多他們可能不感興趣的提醒。單一頻道還可以減少處理單個警報的責任,因爲它們很容易丟失在茫茫噪聲之中。最佳實踐與之相反,是爲單個團隊建立獨立的頻道。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/wechat\/images\/6d\/6d6a86e8f5149cda6bdcffafcdaa117e.png","alt":null,"title":null,"style":null,"href":null,"fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在每個團隊頻道中,你可以把那些依賴或維護該頻道中涉及到的表的用戶加進來。當警報到來時,他們可以使用表情符號來表示他們對警報的反應。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/wechat\/images\/9a\/9af7ebc6e859717f66a1fb865cde7619.png","alt":null,"title":null,"style":null,"href":null,"fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":"center","origin":null},"content":[{"type":"text","text":"示例:在 Slack 或微軟團隊中,用來表示對警告常見反應的表情符號。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"常見的反應包括:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"✅問題已解決"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"🔥重要警告"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"🛠️正在進行修復"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"🆗預期行爲,沒有必要理會"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"👀正在審查"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"或者用戶可以 @同事來診斷和解決底層的問題。"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"9 爲問題提供有效的上下文以便快速歸類"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"當警報發生時,收到這樣的信息很令人無奈:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"codeblock","attrs":{"lang":"objectivec"},"content":[{"type":"text","text":"column user_id in table fact_table has NULL values\n"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"這個警告應該讓用戶回答以下問題:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"爲什麼這個警報很重要?"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"什麼受 user_id 的影響,會受到多大程度的影響?"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在最近看來,這個警報失敗了多少次?"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"誰配置了這個警報,爲什麼配置它?"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"哪些儀表板或 ML 模型依賴於 fact_table?"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"將 user_id 貢獻到 fact_table 的原始數據源是什麼?"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"通知應該直接包含這些信息,或者鏈接到相應的數據目錄平臺。除此之外,通知還應該包含一些能夠突出好壞值特徵的原始數據樣本:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/wechat\/images\/4d\/4d1224e1df89b44b0972c207c435e725.png","alt":null,"title":null,"style":null,"href":null,"fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":"center","origin":null},"content":[{"type":"text","text":"比較好行和壞行 (時間戳值爲空)。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"高級的統計方法可以分析底層數據併產生根本原因分析,從而準確地識別問題發生的位置。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/wechat\/images\/f3\/f3c731b19febe2a8f2e24d0374e80310.png","alt":null,"title":null,"style":null,"href":null,"fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":"center","origin":null},"content":[{"type":"text","text":"上圖是一個識別數據段 (在本例中是 venuestate = ’ NY ') 的根因分析示例,它清楚地標識出底層數據質量問題發生在何處。"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"10 收集用戶反饋並從中學習"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"無論如何,你的數據質量解決方案難免會發出一些無用的警報。在這些情況下,收集反饋就很重要了。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/wechat\/images\/a2\/a2acbe10fb602c407d031861bfa78387.png","alt":null,"title":null,"style":null,"href":null,"fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":"center","origin":null},"content":[{"type":"text","text":"一個用於提供警告反饋的按鈕示例。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"隨着時間的推移,可以使用機器學習調整數據質量監控解決方案,以廢止用戶認爲無用的警報。爲了有效地監控數據,你的系統應該產生全面、有針對性和準確的警報。"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"總結"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"首先,確保最小化假陽性警報。將靜態測試轉換爲更智能的動態測試,以適應你的數據。確保用戶可以調整警報優先級,訂閱他們關心的通知。默認情況下只檢查最新數據,並使規則易於修改。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"其次,應減少誤報帶給用戶的負擔。將類似的問題聚集在一起,並提供準確的警報。使用 API 集成來防止壞數據繼續通過管道傳遞。然後確保你的系統能夠根據用戶的反饋進行調整。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"最後,使你的測試策略儘可能全面,這樣你就不會錯過真正的數據質量問題 (假陰性)。使用動態測試和用戶友好的界面使用戶很容易就能配置警報。利用行級無監督監視來掃描其他警報遺漏的問題。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"綜合這些解決方案,可以確保警報的質量、用戶的工作效率和參與性,日積月累,你所依賴的數據質量會不斷提高。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"原文鏈接:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"link","attrs":{"href":"https:\/\/blog.anomalo.com\/effective-data-monitoring-8bce3ddf87b4","title":"","type":null},"content":[{"type":"text","text":"https:\/\/blog.anomalo.com\/effective-data-monitoring-8bce3ddf87b4"}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"譯者簡介:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"冬雨,小小技術宅一枚,從事研發過程改進及質量改進方面的工作,關注編程、軟件工程、敏捷、DevOps、雲計算等領域,非常樂意將國外新鮮的 IT 資訊和深度技術文章翻譯分享給大家,已翻譯出版《深入敏捷測試》、《持續交付實戰》。"}]}]}
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章