On clustering tree structured data with categorical nature

Authors:
B. Boutsinas;T. Papastergiou
Affiliations:
Department of Business Administration, University of Patras, GR-26500 Rio, Greece and University of Patras Artificial Intelligence Research Center, GR-26500 Rio, Greece;University of Patras Artificial Intelligence Research Center, GR-26500 Rio, Greece
Venue:
Pattern Recognition
Year:
2008

Citing 15
Cited 4

Algorithms for clustering data

Algorithms for clustering data
Classification in Noisy Environments Using a Distance Measure Between Structural Symbolic Descriptions

IEEE Transactions on Pattern Analysis and Machine Intelligence
Word sense disambiguation for free-text indexing using a massive semantic network

CIKM '93 Proceedings of the second international conference on Information and knowledge management
CYC: a large-scale investment in knowledge infrastructure

Communications of the ACM
Towards a standard upper ontology

Proceedings of the international conference on Formal Ontology in Information Systems - Volume 2001
Extensions to the k-Means Algorithm for Clustering Large Data Sets with Categorical Values

Data Mining and Knowledge Discovery
The new k-windows algorithm for improving the k-means clustering algorithm

Journal of Complexity
Knowledge Acquisition Via Incremental Conceptual Clustering

Machine Learning
An Information-Theoretic Definition of Similarity

ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
Data quality and data cleaning: an overview

Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Ontology mapping: the state of the art

The Knowledge Engineering Review
Verbs semantics and lexical selection

ACL '94 Proceedings of the 32nd annual meeting on Association for Computational Linguistics
Using information content to evaluate semantic similarity in a taxonomy

IJCAI'95 Proceedings of the 14th international joint conference on Artificial intelligence - Volume 1
K-Means-Type Algorithms: A Generalized Convergence Theorem and Characterization of Local Optimality

IEEE Transactions on Pattern Analysis and Machine Intelligence
Nearest neighbor pattern classification

IEEE Transactions on Information Theory

Apply extended self-organizing map to cluster and classify mixed-type data

Neurocomputing
Adjusting the clustering results referencing an external set

ICSI'10 Proceedings of the First international conference on Advances in Swarm Intelligence - Volume Part II
High performance genetic algorithm based text clustering using parts of speech and outlier elimination

Applied Intelligence
A parameter-free barebones particle swarm algorithm for unsupervised pattern classification

International Journal of Hybrid Intelligent Systems

Quantified Score

Hi-index	0.01

Visualization

Abstract

Clustering consists in partitioning a set of objects into disjoint and homogeneous clusters. For many years, clustering methods have been applied in a wide variety of disciplines and they also have been utilized in many scientific areas. Traditionally, clustering methods deal with numerical data, i.e. objects represented by a conjunction of numerical attribute values. However, nowadays commercial or scientific databases usually contain categorical data, i.e. objects represented by categorical attributes. In this paper we present a dissimilarity measure which is capable to deal with tree structured categorical data. Thus, it can be used for extending the various versions of the very popular k-means clustering algorithm to deal with such data. We discuss how such an extension can be achieved. Moreover, we empirically prove that the proposed dissimilarity measure is accurate, compared to other well-known (dis)similarity measures for categorical data.