Searching for topics in a large collection of texts

Authors:
Martin Holub;Jiří Semecký;Jiří Diviš
Affiliations:
Charles University, Prague;Charles University, Prague;Charles University, Prague
Venue:
ACLstudent '04 Proceedings of the ACL 2004 workshop on Student research
Year:
2004

Citing 6
Cited 0

Concept decompositions for large sparse text data using clustering

Machine Learning
Modern Information Retrieval

Modern Information Retrieval
Numerical Recipes in C: The Art of Scientific Computing

Numerical Recipes in C: The Art of Scientific Computing
Discriminative Features for Document Classification

ICPR '02 Proceedings of the 16 th International Conference on Pattern Recognition (ICPR'02) Volume 1 - Volume 1
A new approach to conceptual document indexing: building a hierarchical system of concepts based on document clusters

ISICT '03 Proceedings of the 1st international symposium on Information and communication technologies
Morphological tagging: data vs. dictionaries

NAACL 2000 Proceedings of the 1st North American chapter of the Association for Computational Linguistics conference

Quantified Score

Hi-index	0.00

Visualization

Abstract

We describe an original method that automatically finds specific topics in a large collection of texts. Each topic is first identified as a specific cluster of texts and then represented as a virtual concept, which is a weighted mixture of words. Our intention is to employ these virtual concepts in document indexing.In this paper we show some preliminary experimental results and discuss directions of future work.