【讀書筆記】Deep Interest Evolution Network for Click-Through Rate Prediction

原創

2019-04-24 08:36

Zhou G, Mou N, Fan Y, et al. Deep Interest Evolution Network for Click-Through Rate Prediction[J]. arXiv preprint arXiv:1809.03672, 2018.
https://github.com/mouna99/dien

Abstract

對於CTR預測模型，很有必要捕捉用戶興趣的轉移。因此設計了 interest extractor layer 從歷史序列中捕捉用戶暫時的興趣。在訓練的每一步中，我們爲 interest extractor layer 引入了輔助loss。在 interest extractor layer 中加入了attention機制。

Introduction

遵循用戶的興趣是導致一系列行爲的原因，我們設計了輔助 loss，用下一個行爲來訓練當前的 hidden state（稱之爲 interest states）。這樣有助於捕捉更多的語義信息並且是的GRU更高效的表徵興趣。
基於由 interest extractor layer 提取的興趣序列，設計了GRU with attentional update gate (AUGRU)，增強在興趣變化中相關興趣的影響，減弱不相關興趣的影響。

Interest Extractor Layer

用 GRU 的原因是因爲它既可以避免梯度消失，有比 LSTM 速度快。GRU的表達爲：
$u_t=\sigma(W^ui_t+U^uh_{t-1}+b^u)$

$r_t=\sigma(W^ri_t+U^rh_{t-1}+b^r)$

$\tilde{h}_t=tanh(W^hi_t+r_t\circ U^hh_{t-1}+b^h)$

$h_t=(1-u_t)\circ h_{t-1}+u_t\circ\tilde{h}_t$

$u_t$ 相當於遺忘門，控制更新 $h_t$ 的程度， $r_t$ 控制前一時刻對這一時刻的影響， $\tilde{h}_t$ 表示這一時刻的更新狀態， $h_t$ 表示隱藏狀態。
如果只用最後的click結果當作是 label，那麼GRU不能得到充分的訓練，因爲用戶的興趣是導致一系列行爲的原因，用下一個行爲來訓練當前的 hidden state，下一個行爲當作是正樣本，並隨機負採樣，當作是負樣本
$L_{aux}=-\frac{1}{N}(\sum_{i=1}^N\sum_t[log\sigma(h_t^i,e_b^i[t+1]))+log\sigma(h_t^i,e_b^i[t+1]))])$

整個神經網絡的損失函數爲
$L_{target}=-\frac{1}{N}(\sum_{i=1}^N[ylog~p(x)+(1-y)log~(1-p(x))])$

$L=L_{target}+\alpha L_{aux}$

$\alpha$ 用來平衡 interest representation 和 CTR prediction。有輔助loss的幫助，每一個 hidden state 充分的訓練成爲了 represent interest state。

Interest Evolving Layer

再點擊序列中未必都是與最終結果相關的，我們需要增強在興趣變化中相關興趣的影響，減弱不相關興趣的影響，所以給 GRU 增加 attention，權重因子
$a_t=\frac{exp(h_tWe_a)}{\sum_{j=1}^Texp(h_jWe_a)}$

其中 $e_a$ 爲 concat of embedding vectors from fields in category ad。
下面介紹三種加 attention 的 GRU 模型

GRU with attentional input (AIGRU)： $i_t^‘ = h_t\cdot a_t$ ，將 $i_t^‘$ 作爲下一個GRU單元的輸入
Attention based GRU(AGRU)： $h_t^’=(1-a_t)\circ h_{t-1}^’+a_t\circ\tilde{h}_t^’$ ，用 attention score 代替 update gate
GRU with attentional update gate (AUGRU)： $u_t^‘ = u_t\cdot a_t$ ，通過控制 update gate 來實現 attention

總體架構

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

【讀書筆記】Deep Interest Evolution Network for Click-Through Rate Prediction

Abstract

Introduction

Interest Extractor Layer

Interest Evolving Layer

總體架構

釘釘打卡速度慢

Nginx R31 doc 官方文檔-01-nginx 如何安裝

Qt/C++音視頻開發74-合併標籤圖形/生成yolo運算結果圖形/文字和圖形合併成一個/水印濾鏡

挑戰程序設計競賽 2.2章習題 POJ - 3617 Best Cow Line 貪心

字節面試：MySQL什麼時候鎖表？如何防止鎖表？

.NET8連接SQL SERVER 2008 R2 報：證書鏈是由不受信任的頒發機構頒發的

golang開發環境搭建(win10)

python計算機視覺學習筆記——PIL庫的用法

Golang初學：獲取程序內存使用情況，std runtime

【論文閱讀】Solving Billion-Scale Knapsack Problems

【閱讀筆記】Cost-Effective and Stable Policy Optimization Algorithm for Uplift Modeling

【學術】重構具有時間延遲相互作用的動力學網絡

一元方程的求根公式

A holistic approach to semi-supervised learning

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結