詳解微軟大規模稀疏模型MEB:參數高達1350億,可顯著提升搜索相關性

{"type":"doc","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"最近,像GPT-3這樣基於Transformer的深度學習模型在機器學習領域受到了很多關注。這些模型可以很好地理解語義關係,"},{"type":"link","attrs":{"href":"https:\/\/azure.microsoft.com\/en-us\/blog\/bing-delivers-its-largest-improvement-in-search-experience-using-azure-gpus\/","title":"","type":null},"content":[{"type":"text","text":"幫助微軟必應搜索引擎大幅提升了體驗"}]},{"type":"text","text":",並在SuperGLUE學術基準測試上"},{"type":"link","attrs":{"href":"https:\/\/www.microsoft.com\/en-us\/research\/blog\/microsoft-deberta-surpasses-human-performance-on-the-superglue-benchmark\/","title":"","type":null},"content":[{"type":"text","text":"超越了人類水平"}]},{"type":"text","text":"。但是,這些模型可能無法捕獲查詢和文檔術語之間更細微的、超出單純語義的關係。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/resource\/image\/45\/50\/45d1d06eebfb4174bb7e3b7d1bf64550.jpg","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在這篇博文中我們將介紹“Make Every feature Binary”(讓每個特徵都成爲一個二進制特徵,MEB),這是一種大規模稀疏模型,是我們生產Transformer模型的一種補充,可以提升微軟客戶使用"},{"type":"link","attrs":{"href":"https:\/\/www.microsoft.com\/en-US\/ai\/ai-at-scale","title":"","type":null},"content":[{"type":"text","text":"大規模AI"}]},{"type":"text","text":"時的搜索相關性。爲了讓搜索更加準確和動態,MEB更好地利用了大數據的力量,並允許輸入特徵空間擁有超過2000億個二進制特徵,這些特徵反映了搜索查詢和文檔之間的微妙關係。"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"爲什麼“Make Every feature Binary”可以改進搜索結果?"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"之所以MEB可以有效改善基於Transformer的深度學習模型的搜索相關性,一個原因是它可以將單個事實映射到特徵,從而讓MEB能夠更細緻地理解一個個事實。例如,許多深度神經網絡(DNN)語言模型在填寫下面這句話的空白時可能會過度泛化:“(blank) can fly”。由於大多數DNN訓練案例的結果是“birds can fly”,因此DNN語言模型可能只會用“birds”這個詞來填補空白。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"MEB會將每個事實分配給一個特徵,從而避免這種情況,因此它可以通過分配權重來區分不同動物的飛行能力,比如企鵝和海雀。它可以針對區分鳥或任何實體、物體的每一個特徵執行這種操作。MEB與Transformer模型搭配使用,可以將分類級別提升到全新的水平。它對上題給出的答案會是“鳥類可以飛,但鴕鳥、企鵝和其他這些鳥類除外(birds can fly, except ostriches, penguins, and these other birds)。”"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"隨着規模的增加,還有一個元素可以用來提升數據的使用效率。必應中的網頁結果排名是一個機器學習問題,它通過學習大量用戶數據來改進結果。傳統上一種利用點擊數據的方法是爲每個曝光的查詢\/文檔對提取數千個手工數字特徵,並訓練一個梯度提升決策樹(GBDT)模型。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"然而,由於特徵表示和模型容量有限,即使是最先進的GBDT訓練器"},{"type":"link","attrs":{"href":"https:\/\/github.com\/Microsoft\/LightGBM","title":"","type":null},"content":[{"type":"text","text":"LightGBM"}]},{"type":"text","text":"在數億行數據後也會收斂。此外,這些手工製作的數字特徵本質上往往非常粗糙。例如,它們可以捕獲查詢中給定位置的術語在文檔中出現的次數,但關於給定術語具體含義的信息在這一表示中丟失了。此外,這種方法中的特徵並不能一直準確地說明搜索查詢中的詞序等內容。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"爲了釋放海量數據的力量,並找到能夠更好地反映查詢和文檔之間關係的特徵表示,MEB接受了來自必應搜索積累三年,超過5000億個查詢\/文檔對的訓練。輸入特徵空間有超過2000億個二進制特徵。對於"},{"type":"link","attrs":{"href":"http:\/\/www.eecs.tufts.edu\/~dsculley\/papers\/ad-click-prediction.pdf","title":"","type":null},"content":[{"type":"text","text":"FTRL"}]},{"type":"text","text":",最新版本是具有90億個特徵和超過1350億個參數的稀疏神經網絡模型。"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"使用微軟提供的最大通用模型發現隱藏的意圖"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"MEB業已投入生產環境,所有區域和語言的必應搜索100%都用上了它。它是我們在微軟提供的最大的通用模型。它展示了一種出色的能力,可以記住這些二進制特徵所代表的事實,同時以一種持續的方式從大量數據中可靠地學習。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"我們憑經驗發現,對如此龐大規模的數據進行訓練是大型稀疏神經網絡的獨特能力。將相同的必應日誌輸入一個LightGBM模型,並使用傳統數字特徵(例如BM25和其他類型的查詢和文檔匹配特徵)進行訓練時,使用的數據量超過一個月後模型質量就不再提高了。這表明這個模型的容量不足以從更多數據中受益。相比之下,MEB是在積累三年的數據上訓練的,我們發現爲它添加更多數據後它還能繼續學習,這表明模型容量會隨着數據增加而增長。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"與基於Transformer的深度學習模型相比,MEB模型還展示了有趣的超越語義關係的學習能力。在查看MEB學習的主要特徵時,我們發現它可以學習到查詢和文檔之間的隱藏意圖。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/resource\/image\/0a\/47\/0a55e1a26b53e83db8dd3663e470ba47.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":"center","origin":null},"content":[{"type":"text","text":"表1:MEB模型學習的示例"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"例如,MEB瞭解到“Hotmail”與“Microsoft Outlook”兩個詞密切相關,儘管它們在語義上並不接近。MEB發現了這些詞之間微妙的關係:Hotmail是一種免費的基於Web的電子郵件服務,由微軟提供,後來更名爲Microsoft Outlook。類似地,它瞭解到“Fox31”和“KDVR”之間有很強的聯繫,因爲KDVR是位於科羅拉多州丹佛市的電視頻道的呼號,該頻道的運營品牌叫Fox31。同樣,這兩個短語之間並沒有明顯的語義聯繫。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"更有趣的是,MEB可以識別單詞或短語之間的負面關係,找出那些用戶不希望在查詢中看到的內容。例如,搜索“baseball”的用戶通常不會點擊談論“hockey”的頁面,即使它們都是流行的運動。類似地,用戶搜索“瑜伽”的時候不會去點擊包含“歌舞”的文檔。理解這些負面關係有助於搜索引擎忽略不相關的搜索結果。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"MEB學到的這些關係與基於Transformer的DNN模型學到的關係有很好的互補性。搜索相關性的提升對用戶體驗的改善是非常明顯的。在我們的生產級Transformer模型上引入MEB帶來了以下收益:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"numberedlist","attrs":{"start":1,"normalizeStart":1},"content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":1,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"頭部"},{"type":"text","marks":[{"type":"strong"},{"type":"strong"},{"type":"strong"}],"text":"搜索結果的點擊率"},{"type":"text","marks":[{"type":"strong"}],"text":"("},{"type":"text","marks":[{"type":"strong"},{"type":"strong"},{"type":"strong"}],"text":"CTR"},{"type":"text","marks":[{"type":"strong"}],"text":")增加了近2%"},{"type":"text","text":"。用戶無需向下滾動頁面即可找到相關結果。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":2,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"手動查詢重構"},{"type":"text","marks":[{"type":"strong"},{"type":"strong"},{"type":"strong"}],"text":"行爲"},{"type":"text","marks":[{"type":"strong"}],"text":"減少了1%以上"},{"type":"text","text":"。用戶需要手動重新制定查詢內容,意味着他們不喜歡他們在原始查詢中找到的結果。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":3,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"分頁點擊量減少了1.5%以上"},{"type":"text","text":"。用戶需要點擊“下一頁”按鈕,意味着他們沒有在第一頁找到他們想要的東西。"}]}]}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"MEB如何訓練數據並大規模提供特徵"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"模型結構"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"如圖1所示,MEB模型由一個二進制特徵輸入層、一個特徵嵌入層、一個池化層和兩個密集層組成。輸入層包含90億個特徵,由49個特徵組生成,每個二進制特徵編碼爲一個15維嵌入向量。在對每組sum-pooling和concatenation之後,向量通過兩個密集層產生一個點擊概率估計。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/resource\/image\/32\/f6\/32c0fd2149f6261a528f1c2d6f20bdf6.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":"center","origin":null},"content":[{"type":"text","marks":[{"type":"size","attrs":{"size":10}}],"text":"圖1:MEB是一個稀疏神經網絡模型,由一個接受二進制特徵的輸入層、一個將每個二進制特徵轉換爲15維向量的特徵嵌入層、一個sum-pooling層應用於全部49個特徵組並通過concatenation以產生一個735維的向量,然後通過兩個密集層來產生一個點擊概率。此圖中展示的特徵是從示例查詢“Microsoft Windows”和文檔"},{"type":"link","attrs":{"href":"https:\/\/www.microsoft.com\/en-us\/windows","title":"","type":null},"content":[{"type":"text","text":"https:\/\/www.microsoft.com\/en-us\/windows"}],"marks":[{"type":"size","attrs":{"size":10}}]},{"type":"text","marks":[{"type":"size","attrs":{"size":10}}],"text":"生成的,如圖2中所述。"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"訓練數據和統一特徵爲二進制"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"MEB使用了來自必應的三年搜索日誌作爲訓練數據。對於每次必應搜索曝光(impression),我們使用啓發式方法來確定用戶是否對他們單擊的文檔感到滿意。我們將這些“感到滿意”的文檔標記爲正樣本。同一曝光中的其他文檔被標記爲負樣本。對於每個查詢和文檔對,我們從查詢文本、文檔URL、標題和正文文本中提取二進制特徵。這些特徵被輸入到一個稀疏神經網絡模型中,以最小化模型預測的點擊概率和實際點擊標籤之間的交叉熵損失。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"特徵設計和大規模訓練是MEB成功的關鍵所在。MEB特徵是在查詢和文檔之間非常具體的術語級別或N-gram級別的關係上定義的,傳統的數字特徵無法捕獲這些信息,因爲後者只關心查詢和文檔之間的匹配計數。(N-grams是N項的序列。)爲了充分挖掘這個大規模訓練平臺的力量,所有的特徵都被設計爲二進制特徵,可以很容易地用一致的方式覆蓋人工製作的數字特徵和直接從原始文本中提取的特徵。這樣做可以讓MEB在一條路徑上進行端到端的優化。當前的生產模型使用三種主要類型的特徵,如下所述。"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"查詢和文檔N-gram對特徵"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"N-gram對特徵是基於必應搜索日誌中查詢和文檔字段的N-gram組合生成的。如圖2所示,來自查詢文本的N-gram將與來自文檔URL、標題和正文文本的N-gram結合形成N-gram對特徵。更長的N-gram(對於更高的N值)能夠捕捉更豐富和更細微的概念。然而,隨着N的增加,處理它們的成本呈指數級增長。在我們的生產模型中,N設置爲1和2(分別爲unigrams和bigrams)。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"我們還通過組合整個查詢文本和文檔字段來生成特徵。例如,特徵“Query_Title_Microsoft Windows_Explore Windows 10 OS Computer Apps More Microsoft”是從query=“Microsoft Windows”和document title=“Explore Windows 10 OS Computer Apps More Microsoft”生成的特徵。"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"桶化數字特徵的單熱編碼"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"數字特徵首先會分桶,然後通過應用單熱(one-hot)編碼將其轉換爲二進制格式。在圖2所示的示例中,數字特徵“QueryLength”可以採用1到MaxQueryLength之間的任何整數值。我們爲此特徵定義了MaxQueryLength存儲桶,以便“Microsoft Windows”這個查詢具有等於1的二進制特徵QueryLength_2。"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"分類特徵的單熱編碼"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"分類(Categorical)特徵可以通過單熱編碼,以一種直接的方式轉換爲二進制特徵。例如,UrlString是一個分類特徵,每個唯一的URL字符串文本都是一個不同的類別。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/resource\/image\/48\/44\/4897771f2aec1f562abe485fa23e3544.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"圖2:上面是MEB特徵外觀的一個示例。左側展示了一個示例查詢文檔對,其中查詢文本、文檔標題、URL和片段作爲特徵提取的輸入。右側展示了MEB生成的一些典型特徵。例如,“Microsoft Windows”這個查詢和文檔標題“Explore Windows 10 OS, Computers, Apps, & More | Microsoft”生成了一個Query x Title特徵“Query:Microsoft Windows_Title:Explore Windows 10 OS Computer Apps More Microsoft”。由於“Microsoft Windows”這個查詢包含兩個術語,因此生成了二進制特徵“QueryLength_2”。查詢詞和文檔標題詞的每個組合都可以生成一個Query unigram x Title unigram特徵的列表,例如“QTerm:Microsoft_TitleTerm:Explore”等。"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"持續訓練支持萬億查詢\/文檔對,每天刷新"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"爲了在如此巨大特徵空間上完成訓練,我們利用了由"},{"type":"link","attrs":{"href":"https:\/\/about.ads.microsoft.com\/en-us\/h\/a\/microsoft-advertising?ef_id=a34847914f4c184c15292760a4561a7d:G:s&OCID=AID2200059_SEM_a34847914f4c184c15292760a4561a7d:G:s&s_cid=US-ACQ-PPC-src_BNG-sub_prosp-cat_Brand_mt_b&msclkid=a34847914f4c184c15292760a4561a7d","title":"","type":null},"content":[{"type":"text","text":"微軟廣告"}]},{"type":"text","text":"團隊構建的內部大型培訓平臺Woodblock。它是一種用於訓練大型稀疏模型的分佈式、大規模、高性能解決方案。Woodblock建立在TensorFlow之上,填補了通用深度學習框架與對數十億稀疏特徵的工業需求之間的空白。通過對I\/O和數據處理的深度優化,它可以使用CPU和GPU集羣在數小時內訓練數千億個特徵。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"就算用上了Woodblock管道,用包含近一萬億個查詢\/文檔對的必應搜索三年累積日誌訓練MEB也很難一蹴而就。相反,我們採用了一種持續訓練方法,模型每次都會在之前幾個月的數據基礎上再加入新一個月的數據繼續訓練。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"更重要的是,即使在必應中實現後,模型也會每天使用最新的每日點擊數據刷新訓練數據集,如圖3所示。爲了避免過時特徵的負面影響,一個自動過期策略會檢查每個特徵的時間戳,並過濾掉過去500天內未出現的特徵。經過持續的訓練,模型的日常更新部署得以完全自動化。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/resource\/image\/6d\/0d\/6d719b104ebd6bc710597287af3d250d.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":"center","origin":null},"content":[{"type":"text","marks":[{"type":"size","attrs":{"size":10}}],"text":"圖3:上面是一個流程圖,說明了MEB是如何每天刷新的。生產MEB模型每天都使用最新的單日必應搜索日誌數據進行持續訓練。在新模型部署並在線提供服務之前,會從模型中刪除過去500天內未出現的陳舊特徵。這可以保持特徵的新鮮度並有效利用模型容量。"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"使用必應ObjectStore平臺服務超大模型"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"MEB稀疏神經網絡模型加載到內存時佔用720GB的空間。在流量高峯期,系統需要維持每秒3500萬次特徵查找,因此無法從一臺機器上服務MEB模型。相比之下,我們利用了必應的自研"},{"type":"link","attrs":{"href":"https:\/\/nam06.safelinks.protection.outlook.com\/?url=https%3A%2F%2Fwww.microsoft.com%2Fen-us%2Fresearch%2Fblog%2Fevolution-bings-objectstore%2F&data=04%7C01%7CJunyan.Chen%40microsoft.com%7C1df927f6a1524ee57fec08d941123d34%7C72f988bf86f141af91ab2d7cd011db47%7C0%7C0%7C637612368881862856%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=27jmcZg77iOd5O5Na0HCI1XZZ%2B2gSyc%2FDogfPaQIZI0%3D&reserved=0","title":"","type":null},"content":[{"type":"text","text":"ObjectStore"}]},{"type":"text","text":"服務來託管和服務MEB模型。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"ObjectStore是一個多租戶、分佈式鍵值存儲,支持數據和計算託管。MEB的特徵嵌入層在ObjectStore中實現爲一個表查找操作,每個二進制特徵哈希用作檢索其在訓練時產生的嵌入的鍵。池化層和密集層部分的計算量更大,在一個承載用戶定義函數的ObjectStore Coproc(一個接近數據的計算單元)中執行。MEB將計算和數據服務分離到不同的分片中。每個計算分片佔用一部分用於神經網絡處理的生產流量,每個數據分片託管一部分模型數據,如圖4所示:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/resource\/image\/5e\/65\/5e189f425d082a9f84c6a269e1789265.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":"center","origin":null},"content":[{"type":"text","marks":[{"type":"size","attrs":{"size":10}}],"text":"圖4:計算分片中的ObjectStore Coproc與數據分片通信,以檢索特徵嵌入並運行神經網絡。數據分片存儲特徵嵌入表,並支持來自每個Coproc調用的查找請求。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"由於在ObjectStore上運行的大多數負載都是專門做存儲查找的,因此將MEB計算分片和內存中數據分片放在一起,可以讓我們最大限度地利用運行在多租戶集羣中的ObjectStore的計算和內存資源。由於分片分佈在多臺機器上,我們還能夠精細控制每臺機器上的負載,以便在MEB中實現個位數毫秒的服務延遲。"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"支持更快的搜索,更好地理解內容"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"我們發現像MEB這樣非常大的稀疏神經網絡可以學習到基於Transformer的神經網絡無法理解的細微關係,從而成爲後者的有效補充。這種對搜索語言的更深入理解能力爲整個搜索生態系統帶來了一系列顯著好處:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"由於搜索相關性的提升,必應用戶能夠更快找到內容並完成任務,減少重構查詢內容或翻到第1頁之後的需要。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"由於MEB可以更好地理解內容,發佈商和網站管理員可以獲得更多訪問其資產的流量,並且他們可以專注於滿足客戶,而不是花時間尋找有助於提升他們排名的正確關鍵字。一個具體的例子是產品品牌重塑,MEB模型可以自動學習新舊名稱之間的關係,就像它對“Hotmail”和“Microsoft Outlook”所做的那樣。"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"如果你使用DNN爲你的業務提供動力,我們建議你嘗試使用大型稀疏神經網絡來爲這些模型提供補充。如果你有大量的用戶交互歷史流,並且可以輕鬆構建簡單的二進制特徵,我們尤其推薦這樣做。如果你沿着這條路走下去,我們建議你應該讓模型儘可能接近實時地更新。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"MEB只是我們團隊創建有影響力的尖端技術以提高規模和效率,進而改進搜索體驗的一個例子。如果你對搜索和推薦的大規模建模感興趣,我們的Core Search & AI團隊正在招人!你可以在微軟職業"},{"type":"link","attrs":{"href":"https:\/\/careers.microsoft.com\/us\/en\/search-results?keywords=%23semanticsearch%23","title":"","type":null},"content":[{"type":"text","text":"網站"}]},{"type":"text","text":"上找到我們當前的職位空缺。"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"作者介紹"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"link","attrs":{"href":"https:\/\/www.microsoft.com\/en-us\/research\/people\/junyanch\/","title":"","type":null},"content":[{"type":"text","text":"Junyan Chan"}]},{"type":"text","text":"是微軟搜索和人工智能領域的首席應用科學經理。她領導的團隊專注於對必應網絡搜索中的基本問題進行排名。團隊利用了最先進的NLP和機器學習技術來改進Web相關性模型,併爲微軟用戶帶來更滿意的搜索體驗。他們的工作包括了通過超大規模深度學習模型、稀疏神經網絡DNN、LightGBM等技術實施數據創新、特徵工程和模型改進。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"link","attrs":{"href":"https:\/\/www.microsoft.com\/en-us\/research\/people\/fdubut\/","title":"","type":null},"content":[{"type":"text","text":"Frédéric Dubut"}]},{"type":"text","text":"與微軟的工程和數據科學團隊合作,管理必應有機搜索排名的產品團隊。他們的工作涵蓋搜索、個性化、實驗和機器學習運維。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"link","attrs":{"href":"https:\/\/www.microsoft.com\/en-us\/research\/people\/jasol\/","title":"","type":null},"content":[{"type":"text","text":"Jason(Zengzhong)Li"}]},{"type":"text","text":"是微軟WebXT平臺的合作伙伴組工程經理。他的工作重點是大規模低延遲分佈式服務系統,包括k-v存儲、倒排索引服務、向量索引和深度學習模型推理。他也對稀疏密集索引和近似最近鄰搜索等信息檢索算法感興趣。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"link","attrs":{"href":"https:\/\/www.microsoft.com\/en-us\/research\/people\/ranganm\/","title":"","type":null},"content":[{"type":"text","text":"Rangan Majumder"}]},{"type":"text","text":"是微軟的搜索和人工智能副總裁。他們的使命是通過減少用戶的信息需求與圖像、文檔和視頻中的所有知識之間的摩擦,讓世界變得更智能、更高效。他們應用最先進的語言理解、視覺理解和多模態理解來構建更好的搜索、個人助理和生產力應用程序。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"原文鏈接:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"link","attrs":{"href":"https:\/\/www.microsoft.com\/en-us\/research\/blog\/make-every-feature-binary-a-135b-parameter-sparse-neural-network-for-massively-improved-search-relevance\/","title":"","type":null},"content":[{"type":"text","text":"https:\/\/www.microsoft.com\/en-us\/research\/blog\/make-every-feature-binary-a-135b-parameter-sparse-neural-network-for-massively-improved-search-relevance\/"}]}]}]}
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章