Reusable components for partitioning clustering algorithms

Authors:
Boris Delibašić;Kathrin Kirchner;Johannes Ruhland;Miloš Jovanović;Milan Vukićević
Affiliations:
Faculty of Organizational Sciences, University of Belgrade, Belgrade, Serbia;Faculty of Economics and Business Administration, Friedrich Schiller University of Jena, Jena, Germany;Faculty of Economics and Business Administration, Friedrich Schiller University of Jena, Jena, Germany;Faculty of Organizational Sciences, University of Belgrade, Belgrade, Serbia;Faculty of Organizational Sciences, University of Belgrade, Belgrade, Serbia
Venue:
Artificial Intelligence Review
Year:
2009

Citing 23
Cited 5

Design patterns: elements of reusable object-oriented software

Design patterns: elements of reusable object-oriented software
Pattern languages of program design

Pattern languages of program design
Fault-tolerant telecommunication system patterns

The patterns handbooks
Where does reuse start?

ACM SIGSOFT Software Engineering Notes
Self-Organizing Maps

Self-Organizing Maps
COOLCAT: an entropy-based algorithm for categorical clustering

Proceedings of the eleventh international conference on Information and knowledge management
Is This a Pattern?

IEEE Software
Refining Initial Points for K-Means Clustering

ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
X-means: Extending K-means with Efficient Estimation of the Number of Clusters

ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
k-means: a new generalized k-means clustering algorithm

Pattern Recognition Letters
Software Engineering (7th Edition)

Software Engineering (7th Edition)
Organizational Patterns of Agile Software Development

Organizational Patterns of Agile Software Development
A probabilistic framework for semi-supervised clustering

Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Integrating constraints and metric learning in semi-supervised clustering

ICML '04 Proceedings of the twenty-first international conference on Machine learning
K-means clustering via principal component analysis

ICML '04 Proceedings of the twenty-first international conference on Machine learning
Data Mining: Concepts and Techniques

Data Mining: Concepts and Techniques
Similarity Search: The Metric Space Approach (Advances in Database Systems)

Similarity Search: The Metric Space Approach (Advances in Database Systems)
YALE: rapid prototyping for complex data mining tasks

Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)

Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)
k-means++: the advantages of careful seeding

SODA '07 Proceedings of the eighteenth annual ACM-SIAM symposium on Discrete algorithms
The Need for Open Source Software in Machine Learning

The Journal of Machine Learning Research
Editorial: Hybrid Techniques in AI

Artificial Intelligence Review
Generic pattern mining via data mining template library

Proceedings of the 2004 European conference on Constraint-Based Mining and Inductive Databases

A new method for MR grayscale inhomogeneity correction

Artificial Intelligence Review
An architecture for component-based design of representative-based clustering algorithms

Data & Knowledge Engineering
New spatial based MRI image de-noising algorithm

Artificial Intelligence Review
Component-based decision trees for classification

Intelligent Data Analysis
Evolutionary approach for automated component-based decision tree algorithm design

Intelligent Data Analysis - Business Analytics and Intelligent Optimization

Quantified Score

Hi-index	0.00

Visualization

Abstract

Clustering algorithms are well-established and widely used for solving data-mining tasks. Every clustering algorithm is composed of several solutions for specific sub-problems in the clustering process. These solutions are linked together in a clustering algorithm, and they define the process and the structure of the algorithm. Frequently, many of these solutions occur in more than one clustering algorithm. Mostly, new clustering algorithms include frequently occurring solutions to typical sub-problems from clustering, as well as from other machine-learning algorithms. The problem is that these solutions are usually integrated in their algorithms, and that original algorithms are not designed to share solutions to sub-problems outside the original algorithm easily. We propose a way of designing cluster algorithms and to improve existing ones, based on reusable components. Reusable components are well-documented, frequently occurring solutions to specific sub-problems in a specific area. Thus we identify reusable components, first, as solutions to characteristic sub-problems in partitioning cluster algorithms, and, further, identify a generic structure for the design of partitioning cluster algorithms. We analyze some partitioning algorithms (K-means, X-means, MPCK-means, and Kohonen SOM), and identify reusable components in them. We give examples of how new cluster algorithms can be designed based on them.