一年40萬次實驗,字節跳動A/B測試平臺是怎麼煉出來的?

{"type":"doc","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"採訪嘉賓 | 王珂"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"編輯 | Tina"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"2012年,剛剛建立的字節跳動便開啓了A\/B測試之旅,隨着今日頭條、抖音、西瓜視頻等全線業務的使用,將A\/B測試應用在產品命名、交互設計、推薦算法、用戶增長、廣告優化和市場活動等各方面決策上。據今年4月字節跳動旗下火山引擎技術開放日上透露的數據顯示,字節跳動每天同時進行的A\/B測試達到上萬場,單日新增實驗數量超過1500個,覆蓋400多項業務。隨着公司發展,這些數字還在不斷擴大,僅最近一年就已經做了40萬次A\/B測試。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"那麼在字節跳動裏,支撐這些測試的技術平臺是怎麼煉出來的?我們採訪了字節跳動A\/B 實驗平臺技術負責人王珂,通過他的回覆我們可以得到一個初步的瞭解。王珂將在2021年10月21-23日"},{"type":"link","attrs":{"href":"https:\/\/qcon.infoq.cn\/2021\/shanghai\/","title":null,"type":null},"content":[{"type":"text","text":"QCon上海站"}],"marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}]},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"分享主題爲《"},{"type":"link","attrs":{"href":"https:\/\/qcon.infoq.cn\/2021\/shanghai\/presentation\/3861","title":null,"type":null},"content":[{"type":"text","text":"字節跳動 A\/B 測試平臺演進歷史及實踐"}],"marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}]},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"》的演講,更多內容可以通過觀看演講進行了解。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}},{"type":"strong"}],"text":"嘉賓簡介:王珂"},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":",字節跳動A\/B 實驗平臺技術負責人。目前就職於字節跳動數據平臺,負責內部 A\/B 實驗平臺的研發,支撐內部各個業務線的 A\/B 實驗需求,在 A\/B 實驗領域有比較深入的理解。曽經任職於亞馬遜中國庫存健康優化團隊,有多年大數據及服務架構經驗。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"italic"},{"type":"color","attrs":{"color":"#494949","name":"user"}},{"type":"strong"}],"text":"InfoQ:"},{"type":"text","marks":[{"type":"italic"},{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"有說法是目前這種形式的A\/B 測試最早出現於 1990 年代,您認爲其核心原理這麼多年是否有改變?"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}},{"type":"strong"}],"text":"王珂:"},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"核心原理基本沒有發生什麼改變,仍然是依賴隨機採樣獲取兩個樣本集合,施加不同的策略,採集結果比較和分析。如果要說變化,更多的是應對實際的A\/B測試場景,在測試方式、指標設計和顯著性分析方法等細節上有了更多的探索和演進。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"italic"},{"type":"color","attrs":{"color":"#494949","name":"user"}},{"type":"strong"}],"text":"InfoQ:"},{"type":"text","marks":[{"type":"italic"},{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"能否以一個簡單的例子,說明A\/B 測試如何工作?"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}},{"type":"strong"}],"text":"王珂:"},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"簡單來說,一個A\/B測試生命週期大致分三個部分:A\/B測試設計(包括測試內容、預期收益、測試時長、測試流量、觀測指標等的確認)、A\/B測試執行(利用A\/B測試平臺的能力完成分流、配置分發、數據回收等)、A\/B測試結果分析(產出指標數據,顯著性分析,多維下探等,最後形成分析報告)。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"italic"},{"type":"color","attrs":{"color":"#494949","name":"user"}},{"type":"strong"}],"text":"InfoQ:"},{"type":"text","marks":[{"type":"italic"},{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"A\/B 測試適合哪些場景?能運行 A\/B 測試需要哪些必備條件?字節跳動的 A\/B 測試平臺主要適用於什麼場景?"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}},{"type":"strong"}],"text":"王珂:"},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"通常而言,可以將樣本隨機劃分爲互不相關的兩組,同時施加不同策略,並可以提供量化指標衡量策略效果的,都可以進行A\/B測試。比較典型的如政策調整,無法隨機將民衆劃分爲兩部分,一部分執行新政策一部分執行舊政策,這種就不適合進行傳統意義的A\/B測試,而通常會嘗試在一個城市試點新政策,通過 DID 或者 SCM 等分析方法檢驗效果。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"字節跳動的 A\/B 測試平臺立足於服務和支撐字節跳動內各大業務線的 A\/B 測試需求,當前主要適用於算法迭代、產品演化、智慧運營等場景,未來也會隨公司的腳步覆蓋更多的場景。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"italic"},{"type":"color","attrs":{"color":"#494949","name":"user"}},{"type":"strong"}],"text":"InfoQ:"},{"type":"text","marks":[{"type":"italic"},{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"字節跳動的A\/B 測試平臺有哪幾個主要部分組成,整個平臺大概經歷了什麼樣的迭代過程?"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}},{"type":"strong"}],"text":"王珂:"},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"經過多年的迭代,字節跳動A\/B測試平臺由最初服務於推薦算法迭代,到現在包含A\/B測試、配置發佈、自動調參和探索實驗室四大部分,覆蓋了A\/B測試的整個生命週期。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"italic"},{"type":"color","attrs":{"color":"#494949","name":"user"}},{"type":"strong"}],"text":"InfoQ:"},{"type":"text","marks":[{"type":"italic"},{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"數據採集部分,一般平臺使用的是埋點或日誌數據,那麼字節跳動的平臺是通過什麼方法實現的?"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}},{"type":"strong"}],"text":"王珂:"},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"我們的平臺也是基本基於埋點和日誌數據來生產測試數據的。在字節跳動,埋點和日誌數據彙集都有系統化的解決方案,使得我們的 A\/B 測試平臺可以比較容易的給出A\/B測試結果。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"italic"},{"type":"color","attrs":{"color":"#494949","name":"user"}},{"type":"strong"}],"text":"InfoQ:"},{"type":"text","marks":[{"type":"italic"},{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"是否有一些測試比較複雜?字節跳動如何降低複雜性,讓業務人員易於理解和使用?"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}},{"type":"strong"}],"text":"王珂:"},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"會遇到一些比較複雜的場景,平臺也會嘗試優化產品以降低使用門檻。一個比較典型的案例是算法側的超參選擇。在機器學習模型中經常會遇到一些超參,需要算法工程師憑藉經驗和A\/B測試結果來調整這些超參的取值。傳統做法下,算法工程師需要花幾個月的時間,通過不停的 A\/B 測試對比調整遴選合適的超參取值。爲降低該場景的使用複雜性,字節跳動的A\/B測試平臺通過一些統計學方法,自動化的循環執行A\/B測試,分析測試結果,預測最優解取值,協助算法工程師尋找到合適的超參,使得調參耗時由幾個月縮減到幾個禮拜。這也就是我們的自動調參系統。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"italic"},{"type":"color","attrs":{"color":"#494949","name":"user"}},{"type":"strong"}],"text":"InfoQ:"},{"type":"text","marks":[{"type":"italic"},{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"人們在做 A\/B 測試時會犯哪些常犯的錯誤\/陷阱?"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}},{"type":"strong"}],"text":"王珂:"},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"比較常見的是“連續觀測”。舉個誇張點的例子就是,一個A\/B測試啓動起來,使用者每天都會過來看一下是不是指標有顯著正向;直到突然有一天,指標正向了,使用者開心的關掉A\/B測試,撰寫上線報告。這種連續觀測,一旦顯著立即決策的做法會令使用者拿到錯誤結論的風險大幅度上升,是不可取的。因此在字節跳動,A\/B測試使用方式的宣講是我們需要例行去做的很重要的一個事情。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"italic"},{"type":"color","attrs":{"color":"#494949","name":"user"}},{"type":"strong"}],"text":"InfoQ:"},{"type":"text","marks":[{"type":"italic"},{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"A\/B 測試,可能因爲不同的受衆行爲不同,對一家公司有效的東西不一定對另一家公司有效。那麼字節跳動的A\/B 測試平臺如何具備普適性?"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}},{"type":"strong"}],"text":"王珂:"},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"一方面,字節跳動的A\/B測試始終以平臺的方式將A\/B測試做合理抽象,向不同的業務場景提供測試能力,考慮到公司較爲複雜的產品矩陣,A\/B測試平臺從誕生至今的一路迭代中始終站在A\/B測試最基本的抽象上,以保證其普適性。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"另一方面,像指標體系等與業務場景關聯密切的資產,我們既要考慮它可能不具備普適性,而需要做到因業務而異;也要考慮到相似的業務線可能會重複建設相似的指標體系。因此,能夠“將經驗複製一份”也是我們平臺需要頻繁考慮的東西。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"italic"},{"type":"color","attrs":{"color":"#494949","name":"user"}},{"type":"strong"}],"text":"InfoQ:"},{"type":"text","marks":[{"type":"italic"},{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"A\/B 測試平均耗費時長是多少?如何減少“延時”,以比較快的速度得到結果,這方面您們有哪些可供大家參考的經驗?"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}},{"type":"strong"}],"text":"王珂:"},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"不同場景,不同目的下,A\/B 測試需要進行的時間也會有比較大的差異。例如搜索算法的小流量測試是爲了快速探索算法迭代的可行性,幾天或幾個小時便能給出有價值的結果;而產品界面的變更,爲了規避所謂的新奇效應,避免有些用戶出於好奇心而帶來的短期指標上揚,測試可能會開啓幾個禮拜甚至幾個月。比較典型的A\/B測試通常會持續1-3個禮拜。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"通常而言一個A\/B測試需要耗費多久,和A\/B測試內容、測試設計有關。相比而言,減少一個A\/B測試消耗的時間不如提升A\/B測試的併發性,讓系統同時容納更多的A\/B測試,對產品的整體迭代效率提升更加有益。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"italic"},{"type":"color","attrs":{"color":"#494949","name":"user"}},{"type":"strong"}],"text":"InfoQ:"},{"type":"text","marks":[{"type":"italic"},{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"字節跳動A\/B 測試平臺有哪些未來規劃?"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}},{"type":"strong"}],"text":"王珂:"},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"有三個比較重要的方向是我們的A\/B測試平臺在當前比較關注,在未來會加大力度投入的。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"一個是對A\/B測試分析能力的更大力度支持。開A\/B測試在字節跳動相對是比較容易的,但是A\/B測試分析和更多有價值信息的挖掘卻沒有那麼容易。曾經有一個測試,指標上看並沒有顯著的提升,然而在一個特殊的維度上我們發現了顯著的效果,進一步分析推理之後我們對策略進行了調整,最後還是拿到了比較大的收益。由此可見分析能力對於測試平臺而言的重要性。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"另一個是場景化支撐的能力。字節跳動的產品矩陣複雜度越來越高,不同的業務領域對A\/B測試有着不同的訴求,相較於能力堆砌功能強大的巨無霸,一個清新簡潔切合業務屬性的系統對於業務迭代效率的提升更加有益,這也是我們架構演進的必經之路。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"最後一個也是最重要的一個,是方法論的探索和儲備,拓寬A\/B測試的邊界,應對今天和明天我們在業務上會遇到的新問題,例如在社交領域如何更好的解決網絡效應等。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}},{"type":"strong"}],"text":"活動推薦:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"在今年10月21-23日舉辦的QCon全球軟件開發大會上,王珂老師將在【大數據分析技術選型】專題分享主題爲《字節跳動 A\/B 測試平臺演進歷史及實踐》的演講。在爲期三天的大會上,來自不同技術團隊的一線專家還將圍繞雲原生、AIOps落地實踐、大數據分析技術選型、移動新生態、可觀測性技術等25個專題方向分享100+技術實踐案例,會議日程已上線:"},{"type":"link","attrs":{"href":"https:\/\/qcon.infoq.cn\/2021\/shanghai\/schedule?utm_source=wechat&utm_medium=infoq&utm_campaign=9&utm_term=0927&utm_content=yueduyuanwen","title":null,"type":null},"content":[{"type":"text","text":"https:\/\/qcon.infoq.cn\/2021\/shanghai\/schedule?utm_source=wechat&utm_medium=infoq&utm_campaign=9&utm_term=0927&utm_content=yueduyuanwen"}],"marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}]}]}]}
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章