Class-based n-gram models of natural language
Computational Linguistics
A cooccurrence-based thesaurus and two applications to information retrieval
Information Processing and Management: an International Journal
Foundations of statistical natural language processing
Foundations of statistical natural language processing
The Journal of Machine Learning Research
Conceptual grouping in word co-occurrence networks
IJCAI'99 Proceedings of the 16th international joint conference on Artificial intelligence - Volume 2
New experiments in distributional representations of synonymy
CONLL '05 Proceedings of the Ninth Conference on Computational Natural Language Learning
The Journal of Machine Learning Research
Probabilistic latent semantic analysis
UAI'99 Proceedings of the Fifteenth conference on Uncertainty in artificial intelligence
Using Patterns Co-occurrence Matrix for Cleaning Closed Sequential Patterns for Text Mining
WI-IAT '12 Proceedings of the The 2012 IEEE/WIC/ACM International Joint Conferences on Web Intelligence and Intelligent Agent Technology - Volume 01
Hi-index | 0.00 |
We describe efficient techniques for construction of large term co-occurrence graphs, and investigate an application to the discovery of numerous fine-grained (specific) topics. A topic is a small dense subgraph discovered by a random walk initiated at a term (node) in the graph. We observe that the discovered topics are highly interpretable, and reveal the different meanings of terms in the corpus. We show the information-theoretic utility of the topics when they are used as features in supervised learning. Such features lead to consistent improvements in classification accuracy over the standard bag-of-words representation, even at high training proportions. We explain how a layered pyramidal view of the term distribution helps in understanding the algorithms and in visualizing and interpreting the topics.