Hierarchical Clustering Using Non-Greedy Principal Direction Divisive Partitioning

Authors:
Martin Nilsson
Affiliations:
Los Alamos National Laboratory, Los Alamos, NM 87545, USA. nilsson@lanl.gov
Venue:
Information Retrieval
Year:
2002

Citing 10
Cited 7

Information retrieval: data structures and algorithms

Information retrieval: data structures and algorithms
C4.5: programs for machine learning

C4.5: programs for machine learning
Matrix computations (3rd ed.)

Matrix computations (3rd ed.)
Large-scale information retrieval with latent semantic indexing

Information Sciences: an International Journal
Advances in knowledge discovery and data mining

Advances in knowledge discovery and data mining
The nature of mathematical modeling

The nature of mathematical modeling
Data mining: practical machine learning tools and techniques with Java implementations

Data mining: practical machine learning tools and techniques with Java implementations
Information Retrieval Systems: Theory and Implementation

Information Retrieval Systems: Theory and Implementation
Introduction to Modern Information Retrieval

Introduction to Modern Information Retrieval
Principal Direction Divisive Partitioning

Data Mining and Knowledge Discovery

Stemming and lemmatization in the clustering of finnish text documents

Proceedings of the thirteenth ACM international conference on Information and knowledge management
Hierarchical clustering of a Finnish newspaper article collection with graded relevance assessments

Information Retrieval
Enhancing principal direction divisive clustering

Pattern Recognition
Projection based clustering of gene expression data

CIBB'09 Proceedings of the 6th international conference on Computational intelligence methods for bioinformatics and biostatistics
Skin lesions characterisation utilising clustering algorithms

SETN'10 Proceedings of the 6th Hellenic conference on Artificial Intelligence: theories, models and applications
Random direction divisive clustering

Pattern Recognition Letters
A method for the acquisition of ontology-based user profiles

Advances in Engineering Software

Quantified Score

Hi-index	0.00

Visualization

Abstract

We present a non-greedy version of the recently published Principal Direction Divisive Partitioning (PDDP) algorithm. The PDDP algorithm creates a hierarchical taxonomy of a data set by successively splitting the data into sub-clusters. At each level the cluster with largest variance is split by a hyper-plane orthogonal to its leading principal component. The PDDP algorithm is known to produce high quality clusters, especially when applied to high dimensional data, such as document-word feature matrices. It also scales well with both the size and the dimensionality of the data set. However, at each level only the locally optimal choice of spitting is considered. At a later stage this often leads to a non-optimal global partitioning of the data. The non-greedy version of the PDDP algorithm (NGPDDP) presented in this paper address this problem. At each level multiple alternative splitting strategies are considered. Results from applying the algorithm to generated and real data (feature vectors from sets of text documents) are presented. The results show substantial improvements in the cluster quality.