Using syntactic analysis to increase efficiency in visualizing text collections

  • Authors:
  • James Henderson;Paola Merlo;Ivan Petroff;Gerold Schneider

  • Affiliations:
  • University of Geneva, Geneva, Switzerland;University of Geneva, Geneva, Switzerland;University of Geneva, Geneva, Switzerland;University of Geneva, Geneva, Switzerland

  • Venue:
  • COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 1
  • Year:
  • 2002

Quantified Score

Hi-index 0.01

Visualization

Abstract

Self-Organizing Maps (SOMs) are a good method to cluster and visualize large collections of documents, but they are computationally expensive. In this paper, we investigate linguistically motivated reductions on the usual bag-of-words representation, to improve efficiency. We find that reducing the document representation to heads of verb and noun phrases reduces the heavy computational cost without degrading the quality of the map, especially in combination with term reduction techniques. More severe reductions which focus on subject and object nominal phrases are not advantageous.