Retrieval of Relevant Concepts from a Text Collection

Authors:
Henry Anaya-Sánchez;Rafael Berlanga-Llavori;Aurora Pons-Porrata
Affiliations:
Center of Pattern Recognition and Data Mining, Universidad de Oriente, Santiago de Cuba, Cuba;Departament de Llenguatges i Sistemes Informàtics, Universitat Jaume I, Castelló, Spain;Center of Pattern Recognition and Data Mining, Universidad de Oriente, Santiago de Cuba, Cuba
Venue:
Current Topics in Artificial Intelligence
Year:
2007

Citing 9
Cited 1

WordNet: a lexical database for English

Communications of the ACM
A vector space model for automatic indexing

Communications of the ACM
Machine learning in automated text categorization

ACM Computing Surveys (CSUR)
Fast Algorithms for Mining Association Rules in Large Databases

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Selecting the right interestingness measure for association patterns

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
The automated acquisition of topic signatures for text summarization

COLING '00 Proceedings of the 18th conference on Computational linguistics - Volume 1
Topic discovery based on text mining techniques

Information Processing and Management: an International Journal
Ontology based document annotation: trends and open research problems

International Journal of Metadata, Semantics and Ontologies
Word sense disambiguation based on word sense clustering

IBERAMIA-SBIA'06 Proceedings of the 2nd international joint conference, and Proceedings of the 10th Ibero-American Conference on AI 18th Brazilian conference on Advances in Artificial Intelligence

Conceptual Subtopic Identification in the Medical Domain

IBERAMIA '08 Proceedings of the 11th Ibero-American conference on AI: Advances in Artificial Intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper addresses the characterization of a large text collection by introducing a method for retrieving sets of relevant WordNet concepts as descriptors of the collection contents. The method combines models for identifying interesting word co-occurrences with an extension of a word sense disambiguation algorithm in order to retrieve the concepts that better fit in with the collection topics. Multi-word nominal concepts that do not explicitly appear in the texts, can be found among the retrieved concepts. We evaluate our proposal using extensions of recall and precision that are also introduced in this paper.