The Journal of Machine Learning Research
Probabilistic author-topic models for information discovery
Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Automatically classifying emails into activities
Proceedings of the 11th international conference on Intelligent user interfaces
Adding Semantics to Email Clustering
ICDM '06 Proceedings of the Sixth International Conference on Data Mining
Topic and role discovery in social networks with experiments on enron and academic email
Journal of Artificial Intelligence Research
Using semi-structured data for assessing research paper similarity
Information Sciences: an International Journal
Hi-index | 0.00 |
Analyzing the author and topic relations in email corpus is an important issue in both social network analysis and text mining. The Author-Topic model is a statistical model that identifies the author-topic relations. However, in its inference process, it ignores the information at the document level, i.e., the co-occurrence of words within documents are not taken into account in deriving topics. This may not be suitable for email analysis. We propose to adapt the Latent Dirichlet Allocation model for analyzing email corpus. This method takes into account both the author-document relations and the document-topic relations. We use the Author-Topic model as the baseline method and propose measures to compare our method against the Author-Topic model. We did empirical analysis based on experimental results on both simulated data sets and the real Enron email data set to show that our method obtains better performance than the Author-Topic model.