Unsupervised Learning with Mixed Numeric and Nominal Data

Authors:
Cen Li;Gautam Biswas
Affiliations:
-;-
Venue:
IEEE Transactions on Knowledge and Data Engineering
Year:
2002

Citing 17
Cited 26

Implementing imprecision in information systems

Information Sciences: an International Journal - Special issue on expert systems
Machine learning an artificial intelligence approach volume II

Machine learning an artificial intelligence approach volume II
Conceptual clustering and its relation to numerical taxonomy

Artificial intelligence and statistics
Algorithms for clustering data

Algorithms for clustering data
Models of incremental concept formation

Artificial Intelligence
Explanation-based learning: a problem solving perspective

Artificial Intelligence
The formation and use of abstract concepts in design

Concept formation knowledge and experience in unsupervised learning
Building and improving design systems: a machine learning approach

Building and improving design systems: a machine learning approach
Theory refinement combining analytical and empirical methods

Artificial Intelligence
Bayesian classification (AutoClass): theory and results

Advances in knowledge discovery and data mining
A Conceptual Clustering Algorithm for Database Schema Design

IEEE Transactions on Knowledge and Data Engineering
Conceptual Clustering, Categorization, and Polymorphy

Machine Learning
Knowledge Acquisition Via Incremental Conceptual Clustering

Machine Learning
Clustering with Instance-level Constraints

ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Pattern Classification (2nd Edition)

Pattern Classification (2nd Edition)
Iterative optimization and simplification of hierarchical clusterings

Journal of Artificial Intelligence Research
ITERATE: a conceptual clustering algorithm for data mining

IEEE Transactions on Systems, Man, and Cybernetics, Part C: Applications and Reviews

Editorial: Identity fusion in unsupervised environments

Information Fusion
A k-mean clustering algorithm for mixed numeric and categorical data

Data & Knowledge Engineering
Hierarchical clustering of mixed data based on distance hierarchy

Information Sciences: an International Journal
Distance functions for categorical and mixed variables

Pattern Recognition Letters
Classification with Nominal Data Using Intuitionistic Fuzzy Sets

IFSA '07 Proceedings of the 12th international Fuzzy Systems Association world congress on Foundations of Fuzzy Logic and Soft Computing
Collaborative clustering with background knowledge

Data & Knowledge Engineering
Structuring ordered nominal data for event sequence discovery

Proceedings of the international conference on Multimedia
A fuzzy c-means-type algorithm for clustering of data with mixed numeric and categorical attributes employing a probabilistic dissimilarity functional

Expert Systems with Applications: An International Journal
A k-means type clustering algorithm for subspace clustering of mixed numeric and categorical datasets

Pattern Recognition Letters
A hierarchical Naïve Bayes model for approximate identity matching

Decision Support Systems
Toward multimodal situated analysis

ICMI '11 Proceedings of the 13th international conference on multimodal interfaces
A dissimilarity measure for the k-Modes clustering algorithm

Knowledge-Based Systems
Clustering mixed data based on evidence accumulation

ADMA'06 Proceedings of the Second international conference on Advanced Data Mining and Applications
Local linear logistic discriminant analysis with partial least square components

ADMA'06 Proceedings of the Second international conference on Advanced Data Mining and Applications
Determining the number of clusters using information entropy for mixed data

Pattern Recognition
A fuzzy k-prototype clustering algorithm for mixed numeric and categorical data

Knowledge-Based Systems
Algorithm for fuzzy clustering of mixed data with numeric and categorical attributes

ICDCIT'05 Proceedings of the Second international conference on Distributed Computing and Internet Technology
An efficient clustering algorithm based on histogram threshold

ACIIDS'12 Proceedings of the 4th Asian conference on Intelligent Information and Database Systems - Volume Part II
Attribute value weighting in k-modes clustering

Expert Systems with Applications: An International Journal
Similarity-Dissimilarity Plot for Visualization of High Dimensional Data in Biomedical Pattern Classification

Journal of Medical Systems
Interactive data-driven discovery of temporal behavior models from events in media streams

Proceedings of the 20th ACM international conference on Multimedia
A data mining approach to knowledge discovery from multidimensional cube structures

Knowledge-Based Systems
Group affinity based social trust model for an intelligent movie recommender system

Multimedia Tools and Applications
Categorical-and-numerical-attribute data clustering based on a unified similarity metric without knowing cluster number

Pattern Recognition
New cluster ensemble approach to integrative biological data analysis

International Journal of Data Mining and Bioinformatics
Data guided approach to generate multi-dimensional schema for targeted knowledge discovery

AusDM '12 Proceedings of the Tenth Australasian Data Mining Conference - Volume 134

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper presents a Similarity-Based Agglomerative Clustering (SBAC) algorithm that works well for data with mixed numeric and nominal features. A similarity measure, proposed by Goodall for biological taxonomy, that gives greater weight to uncommon feature value matches in similarity computations and makes no assumptions of the underlying distributions of the feature values, is adopted to define the similarity measure between pairs of objects. An agglomerative algorithm is employed to construct a dendrogram and a simple distinctness heuristic is used to extract a partition of the data. The performance of SBAC has been studied on real and artificially generated data sets. Results demonstrate the effectiveness of this algorithm in unsupervised discovery tasks. Comparisons with other clustering schemes illustrate the superior performance of this approach.