Author2Vec: A Novel Framework for Generating User Embedding

Published in Arxiv, 2020

Recommended citation: Wu, X., Lin, W., Wang, Z., & Rastorgueva, E. (2020). Author2vec: A framework for generating user embedding. arXiv preprint arXiv:2003.11627. https://arxiv.org/abs/2003.11627

Download paper here

Online forums and social media platforms provide noisy but valuable data every day. In this paper, we propose a novel end-to-end neural network-based user embedding system, Author2Vec. The model incorporates sentence representations generated by BERT (Bidirectional Encoder Representations from Transformers) with a novel unsupervised pre-training objective, authorship classification, to produce better user embedding that encodes useful user-intrinsic properties. This user embedding system was pre-trained on post data of 10k Reddit users and was analyzed and evaluated on two user classification benchmarks: depression detection and personality classification, in which the model proved to outperform traditional count-based and prediction-based methods. We substantiate that Author2Vec successfully encoded useful user attributes and the generated user embedding performs well in downstream classification tasks without further finetuning.

Recommended citation: Wu, X., Lin, W., Wang, Z., & Rastorgueva, E. (2020). Author2vec: A framework for generating user embedding. arXiv preprint arXiv:2003.11627.