[link]
## Idea Use implicit feedback and item features to project users and items into the same latent space to use with kNN later. Learned metric encodes useritem, useruser and itemitem relationships. ## Loss Users and items are represented by vectors $u_i \in \mathcal{R}^r, v_i \in \mathcal{R}^r$. We define euclidean distance as $d(i,j)= \parallel u_iv_j\ \parallel$ Loss function consists of 3 parts: $$\mathcal{L}=\mathcal{L}_m + \lambda_f\mathcal{L}_f + \lambda_c\mathcal{L}_c$$ ### Weighted Triplet Loss Sample user $i$, positive item $j$ and negative item $k$. $$\mathcal{L}_m=\sum_{i,j,k}w_{ij}[d(i,j)^2d(i,k)^2+m]_{+}$$ where $[z]_{+}=max(0,z)$ and $m>0$ is the margin size. $w_{i,j}$ is calculated in WARP fashion, but sampling $U$ negative items for each positive pair $(i,j)$ instead of sampling until imposter is met. $$w_{i,j}=log(\lfloor Items \frac{M}{U}\rfloor + 1)$$ where $M$ is the number of imposters in $U$ sampled negative items. ### Loss for features Let $x_j \in \mathcal{R}^m$ denote raw feature vector of item $j$. We want it to be close to corresponding item vector $v_j$. $$\mathcal{L}_f=\sum_j \parallel f(x_j)  v_j\ \parallel ^2$$ where $f$ is some transformation (MLP with dropout) to process item features. ### Regularization kNN is ineffective ineffective in highdimensional sparse space, so we bound user and item vectors to a unit sphere. $$\parallel \ u_* \parallel ^2 \leq 1$$ $$\parallel \ v_* \parallel ^2 \leq 1$$ $L_2$ norm is not used because it pulls every object toward the origin which does not have any specific meaning in our case. Covariance regularization is used instead to decorrelate dimensions of the learned metric. The covariances between all pairs of dimensions $i$ and $j$ form a matrix $C$. $$C_{i,j} = \frac{1}{N} \sum_n (y_i^n  \mu_i^n)(y_j^n  \mu_j^n)$$ where $\mu_i = \frac{1}{N}(\parallel C \parallel_f  \parallel diag(C) \parallel_2^2)$ and $\parallel \cdot \parallel_f$ is the Frobenius norm
Your comment:
