Scatter/Gather: a cluster-based approach to browsing large document collections
SIGIR '92 Proceedings of the 15th annual international ACM SIGIR conference on Research and development in information retrieval
Referral Web: combining social networks and collaborative filtering
Communications of the ACM
A language modeling approach to information retrieval
Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Probabilistic latent semantic indexing
Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Websom for Textual Data Mining
Artificial Intelligence Review - Special issue on data mining on the Internet
Efficient clustering of high-dimensional data sets with application to reference matching
Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
An Evaluation of Statistical Approaches to Text Categorization
Information Retrieval
A probabilistic model of information retrieval: development and comparative experiments
Information Processing and Management: an International Journal
Concept decompositions for large sparse text data using clustering
Machine Learning
Authorship Attribution with Support Vector Machines
Applied Intelligence
Clustering and Identifying Temporal Trends in Document Databases
ADL '00 Proceedings of the IEEE Advances in Digital Libraries 2000
The Journal of Machine Learning Research
Algorithms for estimating relative importance in networks
Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
GaP: a factor model for discrete data
Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Probabilistic author-topic models for information discovery
Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Applying discrete PCA in data analysis
UAI '04 Proceedings of the 20th conference on Uncertainty in artificial intelligence
The author-topic model for authors and documents
UAI '04 Proceedings of the 20th conference on Uncertainty in artificial intelligence
Pachinko allocation: DAG-structured mixture models of topic correlations
ICML '06 Proceedings of the 23rd international conference on Machine learning
LDA-based document models for ad-hoc retrieval
SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
A mixture model for contextual text mining
Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Topic and role discovery in social networks
IJCAI'05 Proceedings of the 19th international joint conference on Artificial intelligence
Expectation-propagation for the generative aspect model
UAI'02 Proceedings of the Eighteenth conference on Uncertainty in artificial intelligence
Attribute-based transfer learning for object categorization with zero/one training example
ECCV'10 Proceedings of the 11th European conference on Computer vision: Part V
PLDA+: Parallel latent dirichlet allocation with data placement and pipeline processing
ACM Transactions on Intelligent Systems and Technology (TIST)
Empirical study of topic modeling in Twitter
Proceedings of the First Workshop on Social Media Analytics
Exploiting explicit semantics-based grouping for author interest finding
APWeb'11 Proceedings of the 13th Asia-Pacific web conference on Web technologies and applications
Authorship attribution with latent Dirichlet allocation
CoNLL '11 Proceedings of the Fifteenth Conference on Computational Natural Language Learning
Optimizing enterprise search by automatically relating user context to textual document content
i-KNOW '11 Proceedings of the 11th International Conference on Knowledge Management and Knowledge Technologies
RFID enabled traceability networks: a survey
Distributed and Parallel Databases
Detection of cognitive features from web resources in support of cultural modeling and analysis
Proceedings of the International Conference on Management of Emergent Digital EcoSystems
Using time topic modeling for semantics-based dynamic research interest finding
Knowledge-Based Systems
Topic analysis for online reviews with an author-experience-object-topic model
AIRS'11 Proceedings of the 7th Asia conference on Information Retrieval Technology
Sentiment analysis for online reviews using an author-review-object model
AIRS'11 Proceedings of the 7th Asia conference on Information Retrieval Technology
Pareto charting using multifield freestyle text data applied to Toyota Camry user reviews
Applied Stochastic Models in Business and Industry
Proceedings of the 17th ACM symposium on Access Control Models and Technologies
Recognising speakers from the topics they talk about
Speech Communication
Semantic social network analysis with text corpora
PAKDD'12 Proceedings of the 16th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining - Volume Part I
Journal of the American Society for Information Science and Technology
Authorship attribution with author-aware topic models
ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Short Papers - Volume 2
User community discovery from multi-relational networks
Decision Support Systems
Mining Divergent Opinion Trust Networks through Latent Dirichlet Allocation
ASONAM '12 Proceedings of the 2012 International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2012)
Analysis of collaborative writing processes using revision maps and probabilistic topic models
Proceedings of the Third International Conference on Learning Analytics and Knowledge
Exploring generative models of tripartite graphs for recommendation in social media
Proceedings of the 4th International Workshop on Modeling Social Media
Discovering coherent topics using general knowledge
Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
FRec: a novel framework of recommending users and communities in social media
Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Social Link Prediction in Online Social Tagging Systems
ACM Transactions on Information Systems (TOIS)
Geographic aspects of tie strength and value of information in social networking
Proceedings of the 6th ACM SIGSPATIAL International Workshop on Location-Based Social Networks
Leveraging multi-domain prior knowledge in topic models
IJCAI'13 Proceedings of the Twenty-Third international joint conference on Artificial Intelligence
Hi-index | 0.00 |
We propose an unsupervised learning technique for extracting information about authors and topics from large text collections. We model documents as if they were generated by a two-stage stochastic process. An author is represented by a probability distribution over topics, and each topic is represented as a probability distribution over words. The probability distribution over topics in a multi-author paper is a mixture of the distributions associated with the authors. The topic-word and author-topic distributions are learned from data in an unsupervised manner using a Markov chain Monte Carlo algorithm. We apply the methodology to three large text corpora: 150,000 abstracts from the CiteSeer digital library, 1740 papers from the Neural Information Processing Systems (NIPS) Conferences, and 121,000 emails from the Enron corporation. We discuss in detail the interpretation of the results discovered by the system including specific topic and author models, ranking of authors by topic and topics by author, parsing of abstracts by topics and authors, and detection of unusual papers by specific authors. Experiments based on perplexity scores for test documents and precision-recall for document retrieval are used to illustrate systematic differences between the proposed author-topic model and a number of alternatives. Extensions to the model, allowing for example, generalizations of the notion of an author, are also briefly discussed.