Surprisal in Action: A Comparative Study of LDA and LSA for Keyword Extraction

in publications :: #Konvens

This study compares two methods of topic detection, Latent Dirichlet Allocation (LDA) and Latent Semantic Analysis (LSA), by using it in conjuction with the Topic Context Model (TCM) on the task of keyword extraction. The surprisal values that TCM outputs based on LDA and LSA are compared, both, directly and as inputs to a Recurrent Neural Network (RNN). While in the direct comparison LSA slightly outperforms LDA, LDA and LSA perform on a par when a Recurrent Neural Network (RNN) is trained with surprisal values. In general: semantic surprisal as input of an RNN improves its performance.