Discovery of numerous specific topics via term co-occurrence analysis

  • Authors:
  • Omid Madani;Jiye Yu

  • Affiliations:
  • SRI International, Menlo Park, CA, USA;Stanford University, Stanford, USA

  • Venue:
  • CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

We describe efficient techniques for construction of large term co-occurrence graphs, and investigate an application to the discovery of numerous fine-grained (specific) topics. A topic is a small dense subgraph discovered by a random walk initiated at a term (node) in the graph. We observe that the discovered topics are highly interpretable, and reveal the different meanings of terms in the corpus. We show the information-theoretic utility of the topics when they are used as features in supervised learning. Such features lead to consistent improvements in classification accuracy over the standard bag-of-words representation, even at high training proportions. We explain how a layered pyramidal view of the term distribution helps in understanding the algorithms and in visualizing and interpreting the topics.