Improving hierarchical document cluster labels through candidate term selection

Authors:
Fabiano Fernandes dos Santos;Veronica Oliveira de Carvalho;Solange Oliveira Rezende
Affiliations:
Instituto de Ciências Matemáticas e de Computação, Universidade de São Paulo, São Carlos, SP, Brazil;Instituto de Geociências e Ciências Exatas, Univ Estadual Paulista, Rio Claro, SP, Brazil;Instituto de Ciências Matemáticas e de Computação, Universidade de São Paulo, São Carlos, SP, Brazil
Venue:
Intelligent Decision Technologies
Year:
2012

Citing 14
Cited 0

Beyond market baskets: generalizing association rules to correlations

SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
An algorithm for suffix stripping

Readings in information retrieval
Fast and effective text mining using linear-time document clustering

KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Fast Algorithms for Mining Association Rules in Large Databases

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Term Weighting Approaches in Automatic Text Retrieval

Term Weighting Approaches in Automatic Text Retrieval
The TaxGen Framework: Automating the Generation of a Taxonomy for a Large Document Collection

HICSS '99 Proceedings of the Thirty-Second Annual Hawaii International Conference on System Sciences-Volume 2 - Volume 2
A new version of the ant-miner algorithm discovering unordered rule sets

Proceedings of the 8th annual conference on Genetic and evolutionary computation
Automatically labeling hierarchical clusters

dg.o '06 Proceedings of the 2006 international conference on Digital government research
Visual text mining using association rules

Computers and Graphics
Document Clustering Description Extraction and Its Application

ICCPOL '09 Proceedings of the 22nd International Conference on Computer Processing of Oriental Languages. Language Technology for the Knowledge-based Economy
Iterative optimization and simplification of hierarchical clusterings

Journal of Artificial Intelligence Research
Generic title labeling for clustered documents

Expert Systems with Applications: An International Journal
Selecting candidate labels for hierarchical document clusters using association rules

MICAI'10 Proceedings of the 9th Mexican international conference on Artificial intelligence conference on Advances in soft computing: Part II
Discovering a term taxonomy from term similarities using principal component analysis

EWMF'05/KDO'05 Proceedings of the 2005 joint international conference on Semantics, Web and Mining

Quantified Score

Hi-index	0.00

Visualization

Abstract

One way to organize knowledge and make its search and retrieval easier is to create a structural representation divided by hierarchically related topics. Once this structure is built, it is necessary to find labels for each of the obtained clusters. In many cases the labels must be built using all the terms in the documents of the collection. This paper presents the SeCLAR method, which explores the use of association rules in the selection of good candidates for labels of hierarchical document clusters. The purpose of this method is to select a subset of terms by exploring the relationship among the terms of each document. Thus, these candidates can be processed by a classical method to generate the labels. An experimental study demonstrates the potential of the proposed approach to improve the precision and recall of labels obtained by classical methods only considering the terms which are potentially more discriminative.