A statistics-based approach to control the quality of subclusters in incremental gravitational clustering

Authors:
Chien-Yu Chen;Shien-Ching Hwang;Yen-Jen Oyang
Affiliations:
Department of Computer Science and Information Engineering, National Taiwan University, Taipei 106, Taiwan;Department of Computer Science and Information Engineering, National Taiwan University, Taipei 106, Taiwan;Department of Computer Science and Information Engineering, National Taiwan University, Taipei 106, Taiwan and Graduate Institute of Networking and Multimedia, National Taiwan University, Taipei 1 ...
Venue:
Pattern Recognition
Year:
2005

Citing 13
Cited 4

Algorithms for clustering data

Algorithms for clustering data
Models of incremental concept formation

Artificial Intelligence
The SEQUOIA 2000 storage benchmark

SIGMOD '93 Proceedings of the 1993 ACM SIGMOD international conference on Management of data
BIRCH: an efficient data clustering method for very large databases

SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
The process of knowledge discovery in databases

Advances in knowledge discovery and data mining
Incremental clustering and dynamic information retrieval

STOC '97 Proceedings of the twenty-ninth annual ACM symposium on Theory of computing
Data mining: practical machine learning tools and techniques with Java implementations

Data mining: practical machine learning tools and techniques with Java implementations
Data clustering: a review

ACM Computing Surveys (CSUR)
Data mining: concepts and techniques

Data mining: concepts and techniques
Clustering Algorithms

Clustering Algorithms
DEMON: Mining and Monitoring Evolving Data

IEEE Transactions on Knowledge and Data Engineering
Incremental Clustering for Mining in a Data Warehousing Environment

VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
Clustering of symbolic objects using gravitational approach

IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics

Optimal recursive clustering of likelihood functions for multiple object tracking

Pattern Recognition Letters
Rule induction based on an incremental rough set

Expert Systems with Applications: An International Journal
A new simplified gravitational clustering method for multi-prototype learning based on minimum classification error training

IWICPAS'06 Proceedings of the 2006 Advances in Machine Vision, Image Processing, and Pattern Analysis international conference on Intelligent Computing in Pattern Analysis/Synthesis
A stochastic gravitational approach to feature based color image segmentation

Engineering Applications of Artificial Intelligence

Quantified Score

Hi-index	0.01

Visualization

Abstract

As the sizes of many contemporary databases continue to grow rapidly, incremental clustering has emerged as an essential issue for conducting data analysis on contemporary databases. An incremental clustering algorithm refers to an abstraction of the distribution of the data instances generated by the previous run of the algorithm and therefore is able to cope well with the ever-growing contemporary databases. There are two main challenges in the design of incremental clustering algorithms. The first challenge is how to reduce information loss due to the data abstraction (or summarization) operations. The second challenge is that the clustering result should not be sensitive to the order of input data. This paper presents the GRIN algorithm, an incremental hierarchical clustering algorithm for numerical datasets based on the gravity theory in physics. In the design of GRIN, a statistical test aimed at reducing information loss and distortion is employed to control formation of subclusters as well as to monitor the evolution of the dataset. Due to the statistical test-based summarization approach, GRIN is able to achieve near linear scalability and is not sensitive to input ordering.