閒魚搜索相關性——體驗與效率平衡的背後

{"type":"doc","content":[{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"背景"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"閒魚搜索是閒魚APP最大的成交場景入口, 成交歸因中搜索佔一半以上,所以提高成交效率是工程和算法迭代優化的主要目標,然而只以效率爲最終的衡量標準不但會影響搜索的質量阻礙成交,還會惡化整個平臺的長期生態建設無法成長,所以搜索相關性和平臺生態對搜索平臺至關重要,其中平臺生態治理(例如惡意引流、欺詐、黑產等治理)涉及運營,算法和架構等諸多因素,並不是這裏討論的重點。本文主要討論搜索相關性,它是整個產品的基石,也是搜索技術中的核心。下文則將依次介紹閒魚搜索相關性遇到的問題現狀,優化升級方案以及後續進一步的優化方向。"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"問題"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"當前閒魚搜索存在多種擴召回策略,典型的如query改寫、i2i、商品圖像文本抽取、同款商品文本信息擴充等。擴召回策略是必要的,但不可避免地在下游引入相關性差的case。與此同時下游的相關性控制卻十分有限,在精排階段的相關性主要存在以下問題:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/cf\/cf84cdb28e656fd16ca8d4b895b8e658.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"經過多輪top和中長尾相關性case分析,整理了top的問題類型(同一個case可歸到多個類型),並給出了典型例子:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/18\/18725524a9158bdbe7977af9d74dd5bc.png","alt":"圖片","title":"null","style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":"center","origin":null},"content":[{"type":"text","text":"圖1.badcase分析"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"總結來看,當前系統的問題主要在於相關性策略覆蓋不足,語義表達不準,基礎數據更新迭代慢等問題。"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"技術方案"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"針對上述基礎相關性因子弱、覆蓋不足的問題,閒魚搜索相關性規劃第一階段的優化目標爲建立相對完善、高覆蓋的基礎相關性匹配鏈路,第二步是在基礎策略基礎上探索精細化的語義匹配策略提升相關性。本文主要介紹第一階段方案。"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"工程鏈路設計"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"閒魚搜索目前的工程鏈路爲比較經典的搜索架構:SPL做數據中轉和中間參數構造;QP負責query相關理解、特徵抽取;檢索引擎(Ha3)中做召回、粗排、精排算分、打散;RankService通過在線模型服務實現深度模型重排序。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"而相關性匹配鏈路有兩種選擇,一個是在檢索引擎(Ha3)精排算分插件中完成,另一個則是在RankService中實現。雖然二者各有優勢,選擇在檢索引擎(Ha3)精排算分插件中實現可以靈活的配置召回和精排階段的不同數量方便實驗。出於效率考慮,特徵則優先採用離線抽取方式:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"• Query特徵通過離線抽取,增量方式導入到線上KV存儲中,在線QP中請求KV存儲獲取相應特徵,最後通過SPL傳給檢索引擎(Ha3)精排算分插件。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"• Item特徵的離線抽取同樣包括全量和增量兩部分:天級別全量、15min級別增量回流到檢索引擎(Ha3)。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"• 檢索引擎(Ha3)精排算分插件獲取query和item特徵,在線計算相關性分。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/5d\/5d868070f99a462e6788f3e6ac269a05.png","alt":"圖片","title":"null","style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":"center","origin":null},"content":[{"type":"text","text":"圖2.搜索架構圖"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"算法模型實現"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"搜索算法最終的doc排序往往是考慮多因子,而在閒魚搜索中比較重要的因子包括成交效率因子、相關性因子和其他業務規則。簡化來說可以先只討論相關性和成交效率因子,而二者的融合方式大致有三種:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"• 相關性做檔位分,下游根據相關性分層排序。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"• 相關性連續值score和效率score加權得到最終排序分。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"• 相關性和效率目標聯合優化。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"爲做到對相關性的絕對控制,同時儘可能少的影響效率模型排序分,算法側的鏈路選擇第一種方式:在對Query、商品理解的基礎上,構造各種相關性匹配特徵,進而進行特徵融合得到相關性檔位(這裏分爲三檔)。最後根據規則進行分檔,輸出給檢索引擎(Ha3)精排算分插件和RankService重排做相關性的檔位控制。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"整體的相關性層次鏈路如下圖,其中算法側最爲關鍵的部分包括基礎理解、特徵抽取以及特徵融合。而特徵抽取往往又依賴或包括了基礎理解,因此下文將重點介紹特徵構造和特徵融合,並在特徵構造的討論中穿插基礎理解的工作。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/87\/87213ba256aa9cf7d1a83d6ec0796a44.png","alt":"圖片","title":"null","style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":"center","origin":null},"content":[{"type":"text","text":"圖3.相關性鏈路"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"特徵構造"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"搜索相關性的特徵這裏分爲三個維度:基礎特徵、文本匹配特徵以及語義匹配特徵。基礎特徵主要包括query和item的統計特徵,以及結構化相關的匹配特徵,如類目是否匹配、關鍵屬性(品類、品牌、型號等)是否匹配。文本匹配特徵主要是字面上的匹配特徵,如term匹配數、匹配率、帶同義詞策略的匹配、帶term weight的匹配以及最基礎的BM25分等。語義匹配特徵則主要包括基於點擊行爲的表示匹配、文本和多模態語義匹配。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/3a\/3a23fef53b0db32ff020a104852a45fd.png","alt":"圖片","title":"null","style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":"center","origin":null},"content":[{"type":"text","text":"圖4.搜索相關性的特徵"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"其中基礎特徵和文本匹配特徵相對常規,不再詳細展開。下面重點對語義匹配特徵做進一步的介紹:"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"文本語義匹配"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"處於性能考慮,文本的語義匹配採用雙塔向量匹配模型結構:基礎模型使用開源的BERT,Query和Item共享相同的參數權重。同時爲了適應下游的相關性分檔,模型採用Pointwise的訓練方式。篇幅原因,這裏對模型細節不作展開。而相比模型結構的設計,其實閒魚搜索中更重要的工作在於訓練樣本的構造。由於現階段缺少人工標註數據的積累,所以當前該部分工作主要解決以下兩個問題:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"•高置信樣本挖掘,緩解搜索點擊日誌“點擊但不相關”的問題。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"•定製化的負樣本構造,避免模型收斂過快,只能判斷簡單語義相關性,而對上文提到的閒魚場景\"勉強相關\"的難case無法區分。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"針對以上問題,參考集團相關經驗並結合對閒魚搜索數據的觀察分析,做了如下采樣方案:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"正樣本:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"    •充足曝光下高點擊ctr樣本(ctr大於同query下商品點擊率平均值)"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"負樣本:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"    •同父類目的鄰居葉子類目負採樣。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"    •高曝光低點擊類目樣本:同一個query搜索下,根據點擊過商品的類目分佈,取相對超低頻類目樣本"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"作爲負樣本(如類目分佈佔比 < 0.05的商品視爲負樣本)。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"    •充足曝光情況下,低於相應query平均曝光點擊率10%以下的樣本做負樣本。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"    •基於query核心term替換構造負樣本:如對於“品牌A+品類”結構的Query,使用“品牌B+品類”結構的query做其負樣本。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"    •隨機構造負樣本:爲增加隨機性,該部分實現在訓練時使用同batch中其他樣本做負樣本,同時引入batch hard sample機制。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"上述方式採樣得到的訓練數據,隨機抽測準確率在90%+,進一步採樣後量級在4kw+。在此基礎上訓練雙塔模型,上線方式爲離線抽取Embedding,線上查表並計算向量相似度。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/4e\/4e3b971e60c2422111bea9c9fffe3587.png","alt":"圖片","title":"null","style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":"center","origin":null},"content":[{"type":"text","text":"圖5.文本語義模型"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"該部分工作獨立全量上線,抽測top300 query + 隨機200query搜索滿意度+6.6%;同樣文本語義向量用於i2i向量召回,複用到閒魚求購場景,核心指標點擊互動人次相對提升20.45%。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"blockquote","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"定義搜索query top10商品完全相關\/基本相關佔比>80%爲滿意,一組query評測結果爲滿意的佔比爲query滿意度。"}]}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"多模態語義匹配"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"除了文本語義向量匹配,本次工作也嘗試了多模態語義向量。模型側使用預訓練的多模態BERT,類似工作集團已經有大量的嘗試,本文主要參考過([1],[2]),並對模型和策略作了一些調整:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"• 替換多圖特徵抽取爲首圖region特徵抽取做圖像特徵序列(resnet pooling前的特徵序列),提升鏈路效率;"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"• 替換Bert-base爲Electra-small,減小模型參數(經測試47M的模型,下游分類任務精度損失2個點以內),方便與Resnet聯合E2E訓練。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"下游的匹配任務仍使用雙塔模型策略,和文本語義模型不同的是,這裏直接使用Triple Loss的方式,主要考慮加大模型之間的差異性,使後面的模型融合有更大的空間。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/33\/338bb1a5875b3dc09a0ab0b1bfa5b8e1.png","alt":"圖片","title":"null","style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":"center","origin":null},"content":[{"type":"text","text":"圖6.多模態語義模型"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"blockquote","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"PS: 該部分工作離線AUC爲0.75相對較高,在下游特徵融合AUC提升1個點以上。但在上線過程中,由於需要圖像處理,增量商品特徵更新迴流相對其他鏈路延遲較大,容易造成新商品特徵缺失,因此還需要進一步鏈路優化。"}]}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"點擊圖表示匹配"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"除了上文提到的通過語義向量引入語義信息,還可以藉助搜索日誌中的點擊行爲表示query或item構造圖結構引入新的語義表示。其中基於圖結構的match算法SWING算法,在阿里內部應用廣泛,相關的文章也有很多,這裏不在闡述。針對閒魚場景首先將點擊pair對改造爲點擊pair對,這樣直接沿用現有的swing工具,可以得到query2query的列表。聚合key query的所有相似query,並進行分詞,對所有的term進行加權求和,歸一化後得到key query的表示。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"其中的權重爲swing算法輸出的score,key query的term權重默認爲1。而對於行爲稀疏的長尾query則使用上文語義向量召回最相近的頭部query,補充其語義表示。最終得到的query表示實例:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/43\/435066868001516e2a90ec3d7778cb5d.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"得到query表示後,item同樣做類似的歸一化表示。上線時使用稀疏存儲的方式,在線計算匹配term的加權和作爲點擊圖表示匹配分。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"特徵融合"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"準備好必要的相關性特徵後,下一步則是對衆多特徵的有效融合,本文則採用經典的GBDT模型完成該步驟。選擇GBDT模型的好處一方面在於檢索引擎(Ha3)精排算分插件中有現成的組件可以直接複用,另一方面也在於相比於更加簡單的LR模型可以省去很多特徵預處理步驟,使得線上策略更加簡單。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"模型的訓練使用人工標註的訓練數據,標註目標爲四檔(完全相關、基本相關、勉強相關以及完全不相關)。在訓練階段,四個檔位被映射到1、0.75、0.25和0四個分位,GBDT模型則通過迴歸的方式對分位進行擬合。由於該部分策略是對子特徵的ensemble,因此並不需要非常多的訓練數據(這裏的量級在萬級別)。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/54\/5455fb751b49f68e85d25d8702e9c2e6.png","alt":"圖片","title":"null","style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":"center","origin":null},"content":[{"type":"text","text":"圖7.特徵融合模型"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"最終,經過常規的調參,GBDT特徵融合模型離線AUC可以達到0.86,基本符合預期(最優單特徵AUC爲0.76)。該策略全量上線,在文本語義向量的基礎之上,不影響成交效率的前提下:隨機query抽測(top 800w)DCG@10相對提升6.51%,query搜索滿意度+24%;頭部query同樣也有相應的提升,相應地搜索體感也得到有效提升。"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"小結與思考"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"閒魚搜索相關性優化第一階段重點在於搭建相對完整的鏈路作爲Baseline,爲後續的進一步優化奠定基礎。通過本季度的優化,基礎相關性的問題得到一定的緩解,但仍存在比較大的優化空間。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"而從進一步的case分析可以看出當前策略在細粒度屬性\/意圖的匹配上表現不夠優秀,同時也存在goodcase的誤打壓情況。因此後續的優化方向也比較明確:優化現有特徵細節;增加更精細的相關性特徵如query tagging、商品結構化特徵、核心詞匹配等;積累更多的人工標註數據,同時探索更加細粒度的匹配模型。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"最後拋出策略優化過程中的兩個思考:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"1、相關性與成交效率的“衝突”:工作中被挑戰最多的問題,爲什麼相關性提升沒有帶來交易效率的提升?對於這個問題的幾點考慮:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"• 技術方案不完美是客觀存在的,如上所提到的因爲策略問題導致goodcase誤打壓的情況,會損失一部分交易效率。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"• 相關性或者商品治理優化會將一些勉強相關但可能成交的周邊產品打壓,如相關性優化後搜“手機”類型的query不再出成交頻次更高的“手機殼”、“屏幕”、“總成”等配件、零件,搜“xx車”不再出“輪胎”、“輪轂”等配件;治理優化後“低價引流”或“欺詐”商品減少明顯,但有意思的是欺詐的商品成交效率往往高於正常商品。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"• 完全不相關的query優化後確實會提升交易效率,但和上述交易效率的損失中和後,整體的交易效率提升會變弱。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"• 最後個人認爲電商搜索優化的路徑,應該是開始就在相對嚴格相關性條件下進行成交效率的優化,進而逐步放寬召回限制並同時優化相關性策略,達到相關性和成交效率同步提升。如果在一開始就在弱相關性的條件下只關注成交效率,讓系統處於一個放飛的狀態,突然關注體感的時候一腳剎車,難免會帶來想吐的感覺。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"2、體感會不會騙人:二手交易市場,本身就是“雜”的特質。而很多時候技術同學並不是閒魚的目標用戶,“我”認爲的相關和體感並不一定是用戶的真實需求,一刀切的治理和相關性優化是否會損失掉一部分閒魚產品特色需求的滿足(當然不是說欺詐和色情之類),真正說清楚這個問題還是需要更多的數據支持以及對產品更深層次的理解。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":"br"}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"[1] Hao Tan and Mohit Bansal. Lxmert: Learning cross-modality encoder representations from transformers. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing, 2019."}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"[2] Jiasen Lu, Dhruv Batra, Devi Parikh, and Stefan Lee. Vilbert: Pretraining task-agnostic visiolinguistic representations for vision-and-language tasks. arXiv preprint arXiv:1908.02265, 201"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"本文轉載自:閒魚技術(ID:XYtech_Alibaba)"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"原文鏈接:"},{"type":"link","attrs":{"href":"https:\/\/mp.weixin.qq.com\/s\/08_t85mmK6RhN-bswHRAuA","title":"xxx","type":null},"content":[{"type":"text","text":"閒魚搜索相關性——體驗與效率平衡的背後"}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}}]}
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章