Hybrid document indexing with spectral embedding

  • Authors:
  • Irina Matveeva;Gina-Anne Levow

  • Affiliations:
  • University of Chicago, Chicago, IL;University of Chicago, Chicago, IL

  • Venue:
  • NAACL-Short '07 Human Language Technologies 2007: The Conference of the North American Chapter of the Association for Computational Linguistics; Companion Volume, Short Papers
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

Document representation has a large impact on the performance of document retrieval and clustering algorithms. We propose a hybrid document indexing scheme that combines the traditional bag-of-words representation with spectral embedding. This method accounts for the specifics of the document collection and also uses semantic similarity information based on a large scale statistical analysis. Clustering experiments showed improvements over the traditional tf-idf representation and over the spectral methods based solely on the document collection.