MPM: a hierarchical clustering algorithm using matrix partitioning method for non-numeric data

Authors:
Hewijin Christine Jiau;Yi-Jen Su;Yeou-Min Lin;Shang-Rong Tsai
Affiliations:
Department of Electrical Engineering, National Cheng Kung University, Tainan, People's Republic of China 701;Department of Electrical Engineering, National Cheng Kung University, Tainan, People's Republic of China 701;Department of Electrical Engineering, National Cheng Kung University, Tainan, People's Republic of China 701;Department of Electrical Engineering, National Cheng Kung University, Tainan, People's Republic of China 701
Venue:
Journal of Intelligent Information Systems
Year:
2006

Citing 9
Cited 0

BIRCH: an efficient data clustering method for very large databases

SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
CURE: an efficient clustering algorithm for large databases

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Data mining: concepts and techniques

Data mining: concepts and techniques
Generality-Based Conceptual Clustering with Probabilistic Concepts

IEEE Transactions on Pattern Analysis and Machine Intelligence
Discovery of Frequent Episodes in Event Sequences

Data Mining and Knowledge Discovery
Chameleon: Hierarchical Clustering Using Dynamic Modeling

Computer
Mining Sequential Patterns

ICDE '95 Proceedings of the Eleventh International Conference on Data Engineering
ROCK: A Robust Clustering Algorithm for Categorical Attributes

ICDE '99 Proceedings of the 15th International Conference on Data Engineering
Characterizing Web User Accesses: A Transactional Approach to Web Log Clustering

ITCC '02 Proceedings of the International Conference on Information Technology: Coding and Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Clustering has been widely adopted in numerous applications, including pattern recognition, data analysis, image processing, and market research. When performing data mining, traditional clustering algorithms which use distance-based measurements to calculate the difference between data are unsuitable for non-numeric attributes such as nominal, Boolean, and categorical data. Applying an unsuitable similarity measurement in clustering may cause some valuable information embedded in the data attributes to be lost, and hence low quality clusters will be created. This paper proposes a novel hierarchical clustering algorithm, referred to as MPM, for the clustering of non-numeric data. The goals of MPM are to retain the data features of interest while effectively grouping data objects into clusters with high intra-similarity and low inter-similarity. MPM achieves these goals through two principal methods: (1) the adoption of a novel similarity measurement which has the ability to capture the "characterized properties" of information, and (2) the application of matrix permutation and matrix participation partitioning to the results of the similarity measurement (constructed in the form of a similarity matrix) in order to assign data to appropriate clusters. This study also proposes a heuristic-based algorithm, the Heuristic_MPM, to reduce the processing times required for matrix permutation and matrix partitioning, which together constitute the bulk of the total MPM execution time.