上文介紹了:推薦系統召回模型之MIND用戶多興趣網絡 的理論部分,本文探討一下實踐環節。
1. 膠囊網絡(Capsule Network) 與 傳統網絡比較
MIND模型借鑑了Hiton的膠囊網絡(Capsule Network),提出了Multi-Interest Extractor Layer來對用戶歷史行爲Embedding進行軟聚類,在介紹它之前我們先用一張圖來對比一下Capsule Network與傳統神經網絡的區別。
上圖中右邊是傳統神經元的模型,它接受來自上一層多個神經元的輸出 (標量值) 作爲輸入,對其進行加權、求和,然後進行 Sigmoid、ReLU 等非線性操作,最後輸出一個標量值。
上圖中左邊就是Capsule論文中提到的幾個公式,做一個對比可以發現,capsule的輸入是一個向量 ,輸出 也是一個向量,中間的公式就是從 到 的計算過程,這個過程可以一一地跟傳統的神經元計算過程對應起來。(1) 到 (Eq.2)是一個仿射變換,這是傳統神經元所沒有的操作;(2)然後 到 是在 的加權下然後對 維度進行求和的操作,這個可以看作向量版的加權求和過程;(3)再接着就是 到 , 這個 squashing函數是非線性的,而且輸出的 保持了 的維度,所以可以看作是向量版激活函數。
2. Capsule Layer 設計
論文中的描述:先給每個用戶動態的計算出用戶的k條興趣(k_user),然後Capsule Layer層 直接輸出用戶的k條興趣(k_user);
在實踐中:先給每個用戶固定輸出k_max條興趣,然後在Label-aware Attention層再做自適應選取k_user條興趣向量。【PS:在實際serving一般都是固定的k_max條興趣】
3. Label-aware Attention 設計
在訓練階段,要進行預測的 Label 只有一個Embedding,而用戶有k個Embedding,沒法直接求內積計算匹配度。這裏MIND提出了Label-aware Attention,思路跟DIN是一致的,就是根據 Label 的 Embedding 對用戶的k個Embedding分別求出權重(所謂label-aware),然後對用戶的k個Embedding求加權和,得到最終的一個用戶 Embedding。
4. 實踐出真知
4.1 首先瞅一瞅 Capsule Layer 代碼
class CapsuleLayer(Layer):
def __init__(self, input_units, out_units, max_len, k_max, iteration_times=3,
init_std=1.0, **kwargs):
self.input_units = input_units
self.out_units = out_units
self.max_len = max_len
self.k_max = k_max
self.iteration_times = iteration_times
self.init_std = init_std
super(CapsuleLayer, self).__init__(**kwargs)
def build(self, input_shape):
self.routing_logits = self.add_weight(shape=[1, self.k_max, self.max_len],
initializer=RandomNormal(stddev=self.init_std),
trainable=False, name="B", dtype=tf.float32)
self.bilinear_mapping_matrix = self.add_weight(shape=[self.input_units, self.out_units],
initializer=RandomNormal(stddev=self.init_std),
name="S", dtype=tf.float32)
super(CapsuleLayer, self).build(input_shape)
def call(self, inputs, **kwargs):
behavior_embddings, seq_len = inputs
batch_size = tf.shape(behavior_embddings)[0]
seq_len_tile = tf.tile(seq_len, [1, self.k_max])
for i in range(self.iteration_times):
mask = tf.sequence_mask(seq_len_tile, self.max_len)
pad = tf.ones_like(mask, dtype=tf.float32) * (-2 ** 32 + 1)
routing_logits_with_padding = tf.where(mask, tf.tile(self.routing_logits, [batch_size, 1, 1]), pad)
weight = tf.nn.softmax(routing_logits_with_padding)
behavior_embdding_mapping = tf.tensordot(behavior_embddings, self.bilinear_mapping_matrix, axes=1)
Z = tf.matmul(weight, behavior_embdding_mapping)
interest_capsules = squash(Z)
delta_routing_logits = tf.reduce_sum(
tf.matmul(interest_capsules, tf.transpose(behavior_embdding_mapping, perm=[0, 2, 1])),
axis=0, keepdims=True
)
self.routing_logits.assign_add(delta_routing_logits)
interest_capsules = tf.reshape(interest_capsules, [-1, self.k_max, self.out_units])
return interest_capsules
def compute_output_shape(self, input_shape):
return (None, self.k_max, self.out_units)
def get_config(self, ):
config = {'input_units': self.input_units, 'out_units': self.out_units, 'max_len': self.max_len,
'k_max': self.k_max, 'iteration_times': self.iteration_times, "init_std": self.init_std}
base_config = super(CapsuleLayer, self).get_config()
return dict(list(base_config.items()) + list(config.items()))
4.2 然後瞅一瞅 Label-aware Attention 代碼
class LabelAwareAttention(Layer):
def __init__(self, k_max, pow_p=1, **kwargs):
self.k_max = k_max
self.pow_p = pow_p
super(LabelAwareAttention, self).__init__(**kwargs)
def build(self, input_shape):
self.embedding_size = input_shape[0][-1]
super(LabelAwareAttention, self).build(input_shape)
def call(self, inputs, training=None, **kwargs):
keys = inputs[0]
query = inputs[1]
weight = tf.reduce_sum(keys * query, axis=-1, keepdims=True)
weight = tf.pow(weight, self.pow_p) # [x,k_max,1]
if len(inputs) == 3:
k_user = tf.cast(tf.maximum(
1.,
tf.minimum(
tf.cast(self.k_max, dtype="float32"), # k_max
tf.math.log1p(tf.cast(inputs[2], dtype="float32")) / tf.math.log(2.) # hist_len
)
), dtype="int64")
seq_mask = tf.transpose(tf.sequence_mask(k_user, self.k_max), [0, 2, 1])
padding = tf.ones_like(seq_mask, dtype=tf.float32) * (-2 ** 32 + 1) # [x,k_max,1]
weight = tf.where(seq_mask, weight, padding)
weight = tf.nn.softmax(weight, name="weight")
output = tf.reduce_sum(keys * weight, axis=1)
return output
def compute_output_shape(self, input_shape):
return (None, self.embedding_size)
def get_config(self, ):
config = {'k_max': self.k_max, 'pow_p': self.pow_p}
base_config = super(LabelAwareAttention, self).get_config()
return dict(list(base_config.items()) + list(config.items()))
4.3 MIND模型設計
(1)數據情況
本文使用的數據集是《movielens-1M數據》,數據下載地址:http://files.grouplens.org/datasets/movielens/ml-1m.zip
將數據集加工成如下格式:
5030 2 2 2 2247 3412,2899,2776,2203,2309,2385,743,2958,2512,2485,2404,2675,2568,2555,217,2491,2566,2481,2503,2786,1051,2502,803,3030,1789,2,424,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0 27 2558 1112,1284,1174,3100,1049,2137,2273,2651,340,2163
279 2 3 15 2831 2210,1456,453,1293,3210,2235,2284,1095,1487,3511,738,886,1926,3501,1023,150,1198,3413,156,909,1019,2848,260,2737,1096,2684,1887,107,1143,347,1107,1111,1151,1133,3113,3592,1119,3287,1203,1181,1121,852,1915,1247,3038,240,0,0,0,0 46 2212 820,1009,2076,529,3032,2503,2742,2345,965,366
1300 2 1 11 3282 692,3041,1234,519,1554,1258,3452,1509,1170,1252,2804,754,2866,1987,2416,596,1250,1824,1225,2323,2542,2647,2355,2267,1248,2543,1818,2512,1815,1167,1289,1241,1803,2974,3252,3127,3320,3061,3278,3075,3249,3322,2945,3179,65,1109,3091,1245,2311,3357 165 1880 1545,332,2754,2254,267,1532,1062,1450,1440,2467
323 2 5 13 1799 580,864,1060,2098,2824,1203,1213,1088,1185,2,1925,309,2427,1994,1176,1486,853,1161,29,254,1259,528,1179,1107,1567,4,427,3567,3130,1174,2129,575,347,1415,2786,2204,2487,21,1223,3032,2652,67,2198,1737,45,51,218,2400,1225,467 117 1295 1114,2758,435,318,2251,2111,3650,2510,3705,1111
695 1 2 2 233 2161,2235,700,2962,444,2489,2375,1849,3662,3582,3650,3225,3128,3060,3127,3581,3252,3510,3556,3076,3281,3302,3050,3384,3702,2969,3303,3551,3543,3178,3249,3670,3342,3652,3665,3378,3322,3073,3376,3075,3584,3179,3504,3511,3278,1289,2,467,107,994 190 2945 2456,2716,2635,990,3657,3403,2210,1602,3251,143
數據說明:
第 1 列 | user_id | 用戶id |
第 2 列 | gender | 用戶性別 |
第 3 列 | age | 用戶年齡 |
第 4 列 | occupation | 用戶工作 |
第 5 列 | zip | 用戶郵編 |
第 6 列 | hist_movie_id | 用戶歷史觀看電影序列 |
第 7 列 | hist_len | 用戶歷史觀看電影長度 |
第 8 列 | pos_movie_id | 用戶下一步觀看的電影(正樣本) |
第 9 列 | neg_movie_id | 用戶下一步未觀看的電影(抽樣作爲負樣本) |
數據加工邏輯見:https://github.com/wziji/deep_ctr/blob/master/mind/data.py
(2)MIND模型代碼:
import tensorflow as tf
from tensorflow.keras.layers import Input, Embedding, concatenate, Flatten, Dense, Dropout
from tensorflow.keras.models import Model
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.callbacks import EarlyStopping
from CapsuleLayer import SequencePoolingLayer, LabelAwareAttention, CapsuleLayer
def tile_user_otherfeat(user_other_feature, k_max):
return tf.tile(tf.expand_dims(user_other_feature, -2), [1, k_max, 1])
def mind(
sparse_input_length=1,
dense_input_length=1,
sparse_seq_input_length=50,
embedding_dim = 64,
neg_sample_num = 10,
user_hidden_unit_list = [128, 64],
k_max = 5,
p = 1,
dynamic_k = True
):
# 1. Input layer
user_id_input_layer = Input(shape=(sparse_input_length, ), name="user_id_input_layer")
gender_input_layer = Input(shape=(sparse_input_length, ), name="gender_input_layer")
age_input_layer = Input(shape=(sparse_input_length, ), name="age_input_layer")
occupation_input_layer = Input(shape=(sparse_input_length, ), name="occupation_input_layer")
zip_input_layer = Input(shape=(sparse_input_length, ), name="zip_input_layer")
user_click_item_seq_input_layer = Input(shape=(sparse_seq_input_length, ), name="user_click_item_seq_input_layer")
user_click_item_seq_length_input_layer = Input(shape=(sparse_input_length, ), name="user_click_item_seq_length_input_layer")
pos_item_sample_input_layer = Input(shape=(sparse_input_length, ), name="pos_item_sample_input_layer")
neg_item_sample_input_layer = Input(shape=(neg_sample_num, ), name="neg_item_sample_input_layer")
# 2. Embedding layer
user_id_embedding_layer = Embedding(6040+1, embedding_dim, mask_zero=True, name='user_id_embedding_layer')(user_id_input_layer)
gender_embedding_layer = Embedding(2+1, embedding_dim, mask_zero=True, name='gender_embedding_layer')(gender_input_layer)
age_embedding_layer = Embedding(7+1, embedding_dim, mask_zero=True, name='age_embedding_layer')(age_input_layer)
occupation_embedding_layer = Embedding(21+1, embedding_dim, mask_zero=True, name='occupation_embedding_layer')(occupation_input_layer)
zip_embedding_layer = Embedding(3439+1, embedding_dim, mask_zero=True, name='zip_embedding_layer')(zip_input_layer)
item_id_embedding_layer = Embedding(3706+1, embedding_dim, mask_zero=True, name='item_id_embedding_layer')
pos_item_sample_embedding_layer = item_id_embedding_layer(pos_item_sample_input_layer)
neg_item_sample_embedding_layer = item_id_embedding_layer(neg_item_sample_input_layer)
user_click_item_seq_embedding_layer = item_id_embedding_layer(user_click_item_seq_input_layer)
### ********** ###
# 3. user part
### ********** ###
# 3.1 pooling layer
user_click_item_seq_embedding_layer_pooling = SequencePoolingLayer()\
([user_click_item_seq_embedding_layer, user_click_item_seq_length_input_layer])
print("user_click_item_seq_embedding_layer_pooling", user_click_item_seq_embedding_layer_pooling)
# 3.2 capsule layer
high_capsule = CapsuleLayer(input_units=embedding_dim,
out_units=embedding_dim, max_len=sparse_seq_input_length,
k_max=k_max)\
([user_click_item_seq_embedding_layer, user_click_item_seq_length_input_layer])
print("high_capsule: ", high_capsule)
# 3.3 Concat "sparse" embedding & "sparse_seq" embedding, and tile embedding
other_user_embedding_layer = concatenate([user_id_embedding_layer, gender_embedding_layer, \
age_embedding_layer, occupation_embedding_layer, \
zip_embedding_layer, user_click_item_seq_embedding_layer_pooling],
axis=-1)
other_user_embedding_layer = tf.tile(other_user_embedding_layer, [1, k_max, 1])
print("other_user_embedding_layer: ", other_user_embedding_layer)
# 3.4 user dnn part
user_deep_input = concatenate([other_user_embedding_layer, high_capsule], axis=-1)
print("user_deep_input: ", user_deep_input)
for i, u in enumerate(user_hidden_unit_list):
user_deep_input = Dense(u, activation="relu", name="FC_{0}".format(i+1))(user_deep_input)
#user_deep_input = Dropout(0.3)(user_deep_input)
print("user_deep_input: ", user_deep_input)
if dynamic_k:
user_embedding_final = LabelAwareAttention(k_max=k_max, pow_p=p, )(\
[user_deep_input, pos_item_sample_embedding_layer, \
user_click_item_seq_length_input_layer])
else:
user_embedding_final = LabelAwareAttention(k_max=k_max, pow_p=p, )(\
[user_deep_input, pos_item_sample_embedding_layer])
user_embedding_final = tf.expand_dims(user_embedding_final, 1)
print("user_embedding_final: ", user_embedding_final)
### ********** ###
# 4. item part
### ********** ###
item_embedding_layer = concatenate([pos_item_sample_embedding_layer, \
neg_item_sample_embedding_layer], \
axis=1)
item_embedding_layer = tf.transpose(item_embedding_layer, [0,2,1])
print("item_embedding_layer: ", item_embedding_layer)
### ********** ###
# 5. Output
### ********** ###
dot_output = tf.matmul(user_embedding_final, item_embedding_layer)
dot_output = tf.nn.softmax(dot_output) # 輸出11個值,index爲0的值是正樣本,負樣本的索引位置爲[1-10]
print(dot_output)
user_inputs_list = [user_id_input_layer, gender_input_layer, age_input_layer, \
occupation_input_layer, zip_input_layer, \
user_click_item_seq_input_layer, user_click_item_seq_length_input_layer]
item_inputs_list = [pos_item_sample_input_layer, neg_item_sample_input_layer]
model = Model(inputs = user_inputs_list + item_inputs_list,
outputs = dot_output)
#print(model.summary())
#tf.keras.utils.plot_model(model, to_file='MIND_model.png', show_shapes=True)
model.__setattr__("user_input", user_inputs_list)
model.__setattr__("user_embedding", user_deep_input)
model.__setattr__("item_input", pos_item_sample_input_layer)
model.__setattr__("item_embedding", pos_item_sample_embedding_layer)
return model
(3)MIND模型結構圖
輸入:7個特徵數據,和2組label數據(包含1個正樣本數據,抽樣的10個負樣本數據);
輸出:11個樣本的 Softmax 概率分佈;
(4)訓練MIND模型
import tensorflow as tf
from mind import mind
from tensorflow.keras.optimizers import Adam
early_stopping_cb = tf.keras.callbacks.EarlyStopping(monitor='val_loss', patience=10, restore_best_weights=True)
callbacks = [early_stopping_cb]
model = mind()
model.compile(loss='sparse_categorical_crossentropy', \
optimizer=Adam(lr=1e-3), \
metrics=['sparse_categorical_accuracy'])
# loss="sparse_categorical_accuracy"的應用方式參見:https://mp.weixin.qq.com/s/H4ET0bO_xPm8TNqltMt3Fg
history = model.fit(train_generator, \
epochs=2, \
steps_per_epoch = steps_per_epoch, \
callbacks = callbacks,
validation_data = val_generator, \
validation_steps = validation_steps, \
shuffle=True
)
model.save_weights('mind_model.h5')
訓練結果如下所示:
Train for 989 steps, validate for 7 steps
Epoch 1/2
989/989 [==============================] - 137s 139ms/step - loss: 1.6125 - sparse_categorical_accuracy: 0.4041 - val_loss: 1.5422 - val_sparse_categorical_accuracy: 0.4224
Epoch 2/2
989/989 [==============================] - 131s 133ms/step - loss: 1.3553 - sparse_categorical_accuracy: 0.4910 - val_loss: 1.4716 - val_sparse_categorical_accuracy: 0.4604
本文的代碼請見,歡迎交流:https://github.com/wziji/deep_ctr/tree/master/mind
參考:
(1)https://github.com/shenweichen/DeepMatch/blob/master/deepmatch/models/mind.py
(2)https://github.com/naturomics/CapsNet-Tensorflow/blob/master/capsLayer.py
歡迎關注 “python科技園” 及 添加小編 進羣交流。