Clustering and labeling of multi-dimensional mixed structured data

Authors:
Marco Brambilla;Massimiliano Zanoni
Affiliations:
Dipartimento di Elettronica e Informazione, Politecnico di Milano, Milano, Italy;Dipartimento di Elettronica e Informazione, Politecnico di Milano, Milano, Italy
Venue:
Search Computing
Year:
2012

Citing 16
Cited 0

Conceptual clustering of structured objects: a goal-oriented approach

Artificial Intelligence
Silhouettes: a graphical aid to the interpretation and validation of cluster analysis

Journal of Computational and Applied Mathematics
BIRCH: an efficient data clustering method for very large databases

SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
Reexamining the cluster hypothesis: scatter/gather on retrieval results

SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval
OPTICS: ordering points to identify the clustering structure

SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
A vector space model for automatic indexing

Communications of the ACM
Extensions to the k-Means Algorithm for Clustering Large Data Sets with Categorical Values

Data Mining and Knowledge Discovery
Deep Web Structure

IEEE Internet Computing
Chameleon: Hierarchical Clustering Using Dynamic Modeling

Computer
Cluster-based retrieval using language models

Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
A method to compute distance between two categorical values of same attribute in unsupervised learning for categorical data set

Pattern Recognition Letters
Data Clustering: Theory, Algorithms, and Applications (ASA-SIAM Series on Statistics and Applied Probability)

Data Clustering: Theory, Algorithms, and Applications (ASA-SIAM Series on Statistics and Applied Probability)
Introduction to Information Retrieval

Introduction to Information Retrieval
Cluster Analysis

Cluster Analysis
A survey of Web clustering engines

ACM Computing Surveys (CSUR)
Clustering mixed data based on evidence accumulation

ADMA'06 Proceedings of the Second international conference on Advanced Data Mining and Applications

Quantified Score

Hi-index	0.00

Visualization

Abstract

Cluster Analysis consists of the aggregation of data items of a given set into subsets based on some similarity properties. Clustering techniques have been applied in many fields which typically involve a large amount of complex data. This study focuses on what we call multi-domain clustering and labeling, i.e. a set of techniques for multi-dimensional structured mixed data clustering. The work consists of studying the best mix of clustering techniques that address the problem in the multi-domain setting. Considered data types are numerical, categorical and textual. All of them can appear together within the same clustering scenario. We focus on k-means and agglomerative hierarchical clustering methods based on a new distance function we define for this specific setting. The proposed approach has been validated on some real and realistic data-sets based onto college, automobile and leisure fields. Experimental data allowed to evaluate the effectiveness of the different solutions, both for clustering and labeling.