Topical N-Grams: Phrase and Topic Discovery, with an Application to Information Retrieval

Authors:
Xuerui Wang;Andrew McCallum;Xing Wei
Affiliations:
-;-;-
Venue:
ICDM '07 Proceedings of the 2007 Seventh IEEE International Conference on Data Mining
Year:
2007

Citing 0
Cited 47

Computable social patterns from sparse sensor data

Proceedings of the first international workshop on Location and the web
Context-Based Term Frequency Assessment for Text Classification

PRICAI '08 Proceedings of the 10th Pacific Rim International Conference on Artificial Intelligence: Trends in Artificial Intelligence
Towards design principles for effective context- and perspective-based web mining

Proceedings of the 4th International Conference on Design Science Research in Information Systems and Technology
Web Search Clustering and Labeling with Hidden Topics

ACM Transactions on Asian Language Information Processing (TALIP)
Modeling Chinese documents with topical word-character models

COLING '08 Proceedings of the 22nd International Conference on Computational Linguistics - Volume 1
Exploring content models for multi-document summarization

NAACL '09 Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Topic and role discovery in social networks with experiments on enron and academic email

Journal of Artificial Intelligence Research
Unsupervised context detection using wireless signals

Pervasive and Mobile Computing
Improving Text Rankers by Term Locality Contexts

AIRS '09 Proceedings of the 5th Asia Information Retrieval Symposium on Information Retrieval Technology
Cross-cultural analysis of blogs and forums with mixed-collection topic models

EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 3 - Volume 3
Discovery of latent subcommunities in a blog's readership

ACM Transactions on the Web (TWEB)
PCFGs, topic models, adaptor grammars and learning topical collocations and the structure of proper names

ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
A ConceptLink graph for text structure mining

ACSC '09 Proceedings of the Thirty-Second Australasian Conference on Computer Science - Volume 91
Experts' retrieval with multiword-enhanced author topic model

SS '10 Proceedings of the NAACL HLT 2010 Workshop on Semantic Search
Finding the storyteller: automatic spoiler tagging using linguistic cues

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics
Citation author topic model in expert search

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics: Posters
A supervised topic transition model for detecting malicious system call sequences

Proceedings of the 2011 workshop on Knowledge discovery, modeling and simulation
Social analytics for personalization in work environments

WAIM'11 Proceedings of the 12th international conference on Web-age information management
Using query log and social tagging to refine queries based on latent topics

Proceedings of the 20th ACM international conference on Information and knowledge management
Dynamically generating context-relevant sub-webs

DESRIST'10 Proceedings of the 5th international conference on Global Perspectives on Design Science Research
News thread extraction based on topical n-gram model with a background distribution

ICONIP'11 Proceedings of the 18th international conference on Neural Information Processing - Volume Part II
Collective context-aware topic models for entity disambiguation

Proceedings of the 21st international conference on World Wide Web
Identifying sentiments over N-gram

Proceedings of the 21st international conference companion on World Wide Web
An empirical study of SLDA for information retrieval

AIRS'11 Proceedings of the 7th Asia conference on Information Retrieval Technology
Improving topic evaluation using conceptual knowledge

IJCAI'11 Proceedings of the Twenty-Second international joint conference on Artificial Intelligence - Volume Volume Three
Mining contentions from discussions and debates

Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining
Real-time helpfulness prediction based on voter opinions

Concurrency and Computation: Practice & Experience
A phrase-discovering topic model using hierarchical Pitman-Yor processes

EMNLP-CoNLL '12 Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning
Exploring adaptor grammars for native language identification

EMNLP-CoNLL '12 Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning
Intuitive Topic Discovery by Incorporating Word-Pair's Connection Into LDA

WI-IAT '12 Proceedings of the The 2012 IEEE/WIC/ACM International Joint Conferences on Web Intelligence and Intelligent Agent Technology - Volume 01
An n-gram topic model for time-stamped documents

ECIR'13 Proceedings of the 35th European conference on Advances in Information Retrieval
On collocations and topic models

ACM Transactions on Speech and Language Processing (TSLP) - Special issue on multiword expressions: From theory to practice and use, part 2
An unsupervised topic segmentation model incorporating word order

Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval
The bag-of-repeats representation of documents

Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval
A phrase mining framework for recursive construction of a topical hierarchy

Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining
Hierarchical geographical modeling of user locations from social media posts

Proceedings of the 22nd international conference on World Wide Web
NIFTY: a system for large scale information flow tracking and clustering

Proceedings of the 22nd international conference on World Wide Web
Topic segmentation model based on ATNLDA and co-occurrence theory and its application in stem cell field

Journal of Information Science
On handling textual errors in latent document modeling

Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Concept-based analysis of scientific literature

Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Activity clustering for anomaly detection

International Journal of Intelligent Information and Database Systems
Enhancement of passage scorers by proximity-based term occurrence weighting

International Journal of Intelligent Information and Database Systems
Probabilistic topic models for sequence data

Machine Learning
Supervised N-gram topic model

Proceedings of the 7th ACM international conference on Web search and data mining
A probabilistic approach to mining mobile phone data sequences

Personal and Ubiquitous Computing
Activity-based topic discovery

Web Intelligence and Agent Systems
Discovery of clinical pathway patterns from event logs using probabilistic topic models

Journal of Biomedical Informatics

Quantified Score

Hi-index	0.00

Visualization

Abstract

Most topic models, such as latent Dirichlet allocation, rely on the bag-of-words assumption. However, word order and phrases are often critical to capturing the meaning of text in many text mining tasks. This paper presents topical n-grams, a topic model that discovers topics as well as topical phrases. The probabilistic model generates words in their textual order by, for each word, first sampling a topic, then sampling its status as a unigram or bigram, and then sampling the word from a topic-specific unigram or bigram distribution. Thus our model can model "white house" as a special meaning phrase in the `politics' topic, but not in the `real estate' topic. Successive bigrams form longer phrases. We present experiments showing meaningful phrases and more interpretable topics from the NIPS data and improved information retrieval performance on a TREC collection.