View on GitHub

memo

doc2vec

doc2vec

word2vec

\[f(g(x; \theta))\]

Continuous Bug Of Words

Step1. Collect all of the words appeard in the training data

Step2. Labeled the words with unique ID

Step3. Train a neural network with

Step4. The represented vector of $i$-th is the the output of the trained neural network

Skip-Gram

Step1. Collect all of the words appeard in the training data

Step2. Labeled the words with unique ID

Step3. Solve the maximization problem with

\[\begin{eqnarray} p_{I \mid J}(i \mid j) & := & \frac{ \exp \left( \theta_{i}^{\mathrm{T}} \theta_{j} \right) }{ \sum_{k=1}^{K} \exp \left( \theta_{k}^{\mathrm{T}} \theta_{j} \right) } \nonumber \\ & & \frac{1}{T} \sum_{t=1}^{T} \sum_{-W \le l \le W, l \neq 0} \log p_{I \mid J}(A(w_{t + l}) \mid A(w_{t})) \label{skip_gram_objective_function} . \end{eqnarray}\]

Step4. The represented vector of $i$-th unique word is the $\theta_{i}$ obtained by maximizing the above equation.

Reference