Term-weighting approaches in automatic text retrieval
Information Processing and Management: an International Journal
Self-organization and associative memory: 3rd edition
Self-organization and associative memory: 3rd edition
Natural language processing for information retrieval
Communications of the ACM
An investigation of linguistic features and clustering algorithms for topical document clustering
SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
Natural Language Information Retrieval
Natural Language Information Retrieval
Self organization of a massive document collection
IEEE Transactions on Neural Networks
Dependency-Based Construction of Semantic Space Models
Computational Linguistics
Hi-index | 0.01 |
Self-Organizing Maps (SOMs) are a good method to cluster and visualize large collections of documents, but they are computationally expensive. In this paper, we investigate linguistically motivated reductions on the usual bag-of-words representation, to improve efficiency. We find that reducing the document representation to heads of verb and noun phrases reduces the heavy computational cost without degrading the quality of the map, especially in combination with term reduction techniques. More severe reductions which focus on subject and object nominal phrases are not advantageous.