A customizable hybrid approach to data clustering

Authors:
Yu Qian;Kang Zhang
Affiliations:
The University of Texas at Dallas, Richardson, TX;The University of Texas at Dallas, Richardson, TX
Venue:
Proceedings of the 2003 ACM symposium on Applied computing
Year:
2003

Citing 11
Cited 2

BIRCH: an efficient data clustering method for very large databases

SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
Exact and approximation algorithms for clustering

Proceedings of the ninth annual ACM-SIAM symposium on Discrete algorithms
Data clustering: a review

ACM Computing Surveys (CSUR)
Locality metrics and program physical structures

Journal of Systems and Software - Special issue on software maintenance
Clustering spatial data using random walks

Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
Chameleon: Hierarchical Clustering Using Dynamic Modeling

Computer
Data Mining: An Overview from a Database Perspective

IEEE Transactions on Knowledge and Data Engineering
Efficient and Effective Clustering Methods for Spatial Data Mining

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Graph-Based Hierarchical Conceptual Clustering

Proceedings of the Thirteenth International Florida Artificial Intelligence Research Society Conference
STING: A Statistical Information Grid Approach to Spatial Data Mining

VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
Effective Graph Visualization Via Node Grouping

INFOVIS '01 Proceedings of the IEEE Symposium on Information Visualization 2001 (INFOVIS'01)

Semantics-guided clustering of heterogeneous XML schemas

Journal on data semantics IX
An approach for clustering semantically heterogeneous XML schemas

OTM'05 Proceedings of the 2005 Confederated international conference on On the Move to Meaningful Internet Systems - Volume >Part I

Quantified Score

Hi-index	0.00

Visualization

Abstract

Most current data clustering algorithms in data mining are based on a distance calculation in certain metric space. For Spatial Database Systems (SDBS), the Euclidean distance between two data points is often used to represent the relationship between data points. However, in some spatial settings and many other applications, distance alone is not enough to represent all the attributes of the relation between data points. We need a more powerful model to record more relational information between data objects. This paper adopts a graph model by which a database is regarded as a graph: each vertex of the graph represents a data point, and each edge, weighted or unweighted, is used to record the relation between two data points connected by the edge. Based on the graph model, this paper presents a set of cluster analysis criteria to guide data clustering. The criteria can be used to measure clustering results and help improving the quality of clustering. Further, a customizable algorithm using the criteria is proposed and implemented. This algorithm can produce clusters according to users' specifications. Preliminary experiments show encouraging results.