1:word2vec 可以用于时序数据的挖掘,比如商品的浏览分析,app 下载分析,通过word2vec 可以得到商品或 app 的 向量表示,从而用于推荐等,个性化展示
http://ginobefunny.com/post/learning_word2vec/
2:一些使用经验
There's no universal rules-of-thumb, as even what makes a set of word-vectors good for one purpose might not be best for other purposes. (For example, word-vecs that do best on the analogies-test may not be also do the best at a topical-classification task that works on some mean-of-word-vectors.)
That said:
be sure to use the latest gensim; earlier versions could be significantly slower on very-short text examples (like tweets)
larger window sizes seem to position words closer according to topical-domain/field-of-use/semantic similarity; shorter window sizes position words closer based on functional/syntactic similarity (serve same role in sentence)
as your dataset gets larger, sometimes very-small values of
window
andnegative
are just as good (or better) and faster than larger valuesas your dataset gets larger, more-aggressive frequent-word downsampling (the 'sample' parameter becoming smaller but not zero) can offer both speed and quality benefits (by spending fewer training cycles on redundant well-represented words)
it's typical to use more than one iteration, but as your data gets larger (and if you're confident word/word-senses are randomly distributed from front to back) the benefits of extra iterations will lessen
- Gordon