Learning structure and concepts in data through data clustering

Authors:
Gregory James Hamerly;Charles P. Elkan
Affiliations:
-;-
Venue:
Learning structure and concepts in data through data clustering
Year:
2003

Citing 0
Cited 6

Improving the accuracy of subcategorizations acquired from corpora

ACLstudent '04 Proceedings of the ACL 2004 workshop on Student research
An overview of clustering methods

Intelligent Data Analysis
Unsupervised classification of polarimetric SAR image with dynamic clustering: An image processing approach

Advances in Engineering Software
Fractional particle swarm optimization in multidimensional search space

IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics
Personalized long-term ECG classification: A systematic approach

Expert Systems with Applications: An International Journal
KHM clustering technique as a segmentation method for endoscopic colour images

International Journal of Applied Mathematics and Computer Science - Semantic Knowledge Engineering

Quantified Score

Hi-index	0.00

Visualization

Abstract

Data clustering is an important and applications-oriented branch of machine learning. Its goal is to estimate the structure or density of a set of data without a training signal. There are many approaches to data clustering that vary in their complexity and effectiveness, due to the wide number of applications that these algorithms have. Due to the explosive growth of the amount of data that humans want to analyze, fast (e.g. linear-time) algorithms are necessary, but they can often give poor quality results. While maintaining the runtime characteristics of the fast algorithms, we show modifications that improve clustering algorithms in two ways. The first focus is on finding better solutions for a fixed number of clusters. We decompose the algorithms into fundamental parts, and analyze how the parts affect the quality of clustering solutions. The second focus is on estimating the number of clusters efficiently using statistical hypothesis tests, and how that may be applied in novel ways. We also discuss the application of data clustering to the task of learning the structure of computer programs. We show how clustering may be used to improve the accuracy of computer processor simulations while simultaneously improving their efficiency.