A quality driven Hierarchical Data Divisive Soft Clustering for information retrieval

Authors:
Gloria Bordogna;Gabriella Pasi
Affiliations:
CNR - National Research Council-IDPA, Dalmine (BG), Italy;Universití degli Studi di Milano Bicocca - DISCo, Viale Sarca 336, 20126 Milano, Italy
Venue:
Knowledge-Based Systems
Year:
2012

Citing 19
Cited 3

Recent trends in hierarchic document clustering: a critical review

Information Processing and Management: an International Journal
Scatter/Gather: a cluster-based approach to browsing large document collections

SIGIR '92 Proceedings of the 15th annual international ACM SIGIR conference on Research and development in information retrieval
Default knowledge and measures of specificity

Information Sciences: an International Journal
Bayesian classification (AutoClass): theory and results

Advances in knowledge discovery and data mining
Algorithms for Model-Based Gaussian Hierarchical Clustering

SIAM Journal on Scientific Computing
Data clustering: a review

ACM Computing Surveys (CSUR)
Bringing order to the Web: automatically categorizing search results

Proceedings of the SIGCHI conference on Human Factors in Computing Systems
Principal Direction Divisive Partitioning

Data Mining and Knowledge Discovery
Category-Based Filtering and User Stereotype Cases to Reduce the Latency Problem in Recommender Systems

ECCBR '02 Proceedings of the 6th European Conference on Advances in Case-Based Reasoning
Supervised term weighting for automated text categorization

Proceedings of the 2003 ACM symposium on Applied computing
Hierarchical Clustering Algorithms for Document Datasets

Data Mining and Knowledge Discovery
Semi-supervised fuzzy clustering: A kernel-based approach

Knowledge-Based Systems
The cluster-abstraction model: unsupervised learning of topic hierarchies from text data

IJCAI'99 Proceedings of the 16th international joint conference on Artificial intelligence - Volume 2
A Multiobjective Evolutionary Conceptual Clustering Methodology for Gene Annotation Within Structural Databases: A Case of Study on the Gene Ontology Database

IEEE Transactions on Evolutionary Computation
Robust clustering methods: a unified view

IEEE Transactions on Fuzzy Systems
Generalized fuzzy c-means clustering strategies using Lp norm distances

IEEE Transactions on Fuzzy Systems
Fuzzy clustering with volume prototypes and adaptive cluster merging

IEEE Transactions on Fuzzy Systems
A new method for fuzzy information retrieval based on fuzzy hierarchical clustering and fuzzy inference techniques

IEEE Transactions on Fuzzy Systems
On cluster validity for the fuzzy c-means model

IEEE Transactions on Fuzzy Systems

A fuzzy k-prototype clustering algorithm for mixed numeric and categorical data

Knowledge-Based Systems
PCA-based high-dimensional noisy data clustering via control of decision errors

Knowledge-Based Systems
A sample-based hierarchical adaptive K-means clustering method for large-scale video retrieval

Knowledge-Based Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper an adaptive hierarchical fuzzy clustering algorithm is presented, named Hierarchical Data Divisive Soft Clustering (H2D-SC). The main novelty of the proposed algorithm is that it is a quality driven algorithm, since it dynamically evaluates a multi-dimensional quality measure of the clusters to drive the generation of the soft hierarchy. Specifically, it generates a hierarchy in which each node is split into a variable number of sub-nodes, determined by an innovative quality assessment of soft clusters, based on the evaluation of multiple dimensions such as the cluster's cohesion, its cardinality, its mass, and its fuzziness, as well as the partition's entropy. Clusters at the same hierarchical level share a minimum quality value: clusters in the lower levels of the hierarchy have a higher quality; this way more specific clusters (lower level clusters) have a higher quality than more general clusters (upper level clusters). Further, since the algorithm generates a soft partition, a document can belong to several sub-clusters with distinct membership degrees. The proposed algorithm is divisive, and it is based on a combination of a modified bisecting K-Means algorithm with a flat soft clustering algorithm used to partition each node. The paper describes the algorithm and its evaluation on two standard collections.