WordNet: a lexical database for English
Communications of the ACM
A vector space model for automatic indexing
Communications of the ACM
Machine learning in automated text categorization
ACM Computing Surveys (CSUR)
Fast Algorithms for Mining Association Rules in Large Databases
VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Selecting the right interestingness measure for association patterns
Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
The automated acquisition of topic signatures for text summarization
COLING '00 Proceedings of the 18th conference on Computational linguistics - Volume 1
Topic discovery based on text mining techniques
Information Processing and Management: an International Journal
Ontology based document annotation: trends and open research problems
International Journal of Metadata, Semantics and Ontologies
Word sense disambiguation based on word sense clustering
IBERAMIA-SBIA'06 Proceedings of the 2nd international joint conference, and Proceedings of the 10th Ibero-American Conference on AI 18th Brazilian conference on Advances in Artificial Intelligence
Conceptual Subtopic Identification in the Medical Domain
IBERAMIA '08 Proceedings of the 11th Ibero-American conference on AI: Advances in Artificial Intelligence
Hi-index | 0.00 |
This paper addresses the characterization of a large text collection by introducing a method for retrieving sets of relevant WordNet concepts as descriptors of the collection contents. The method combines models for identifying interesting word co-occurrences with an extension of a word sense disambiguation algorithm in order to retrieve the concepts that better fit in with the collection topics. Multi-word nominal concepts that do not explicitly appear in the texts, can be found among the retrieved concepts. We evaluate our proposal using extensions of recall and precision that are also introduced in this paper.