Clustering categorical data: an approach based on dynamical systems

Authors:
David Gibson;Jon Kleinberg;Prabhakar Raghavan
Affiliations:
Department of Computer Science UC Berkeley, Berkeley, CA 94720 USA/ e-mail: dag@cs.berkeley.edu;Department of Computer Science, Cornell University, Ithaca, NY 14853/ e-mail: kleinber@cs.cornell.edu;Almaden Research Center IBM, San Jose, CA 95120 USA/ e-mail: pragh@almaden.ibm.com
Venue:
The VLDB Journal — The International Journal on Very Large Data Bases
Year:
2000

Citing 21
Cited 28

Eigen values and expanders

Combinatorica
Parallel distributed processing: explorations in the microstructure of cognition, vol. 1: foundations

Parallel distributed processing: explorations in the microstructure of cognition, vol. 1: foundations
Neural computation and self-organizing maps: an introduction

Neural computation and self-organizing maps: an introduction
Mining association rules between sets of items in large databases

SIGMOD '93 Proceedings of the 1993 ACM SIGMOD international conference on Management of data
Coloring random and semi-random k-colorable graphs

Journal of Algorithms
BIRCH: an efficient data clustering method for very large databases

SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
The Markov chain Monte Carlo method: an approach to approximate counting and integration

Approximation algorithms for NP-hard problems
Dynamic itemset counting and implication rules for market basket data

SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
Beyond market baskets: generalizing association rules to correlations

SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
Fast discovery of association rules

Advances in knowledge discovery and data mining
The anatomy of a large-scale hypertextual Web search engine

WWW7 Proceedings of the seventh international conference on World Wide Web 7
Authoritative sources in a hyperlinked environment

Journal of the ACM (JACM)
Searching Multimedia Databases by Content

Searching Multimedia Databases by Content
Statistical Language Learning

Statistical Language Learning
Computers and Intractability: A Guide to the Theory of NP-Completeness

Computers and Intractability: A Guide to the Theory of NP-Completeness
Query by Image and Video Content: The QBIC System

Computer
Content-Based Image Indexing

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Fast Algorithms for Mining Association Rules in Large Databases

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Mining Generalized Association Rules

VLDB '95 Proceedings of the 21th International Conference on Very Large Data Bases
Sampling Large Databases for Association Rules

VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases
Spectral partitioning works: planar graphs and finite element meshes

FOCS '96 Proceedings of the 37th Annual Symposium on Foundations of Computer Science

Studying Recommendation Algorithms by Graph Analysis

Journal of Intelligent Information Systems
Algebraic Techniques for Analysis of Large Discrete-Valued Datasets

PKDD '02 Proceedings of the 6th European Conference on Principles of Data Mining and Knowledge Discovery
A New Conceptual Clustering Framework

Machine Learning
Organizing structured web sources by query schemas: a clustering approach

Proceedings of the thirteenth ACM international conference on Information and knowledge management
Compression, Clustering, and Pattern Discovery in Very High-Dimensional Discrete-Attribute Data Sets

IEEE Transactions on Knowledge and Data Engineering
A database clustering methodology and tool

Information Sciences—Informatics and Computer Science: An International Journal
Categorical data visualization and clustering using subjective factors

Data & Knowledge Engineering
Core algorithms in the CLEVER system

ACM Transactions on Internet Technology (TOIT)
Combining advantages of new chromosome representation scheme and multi-objective genetic algorithms for better clustering

Intelligent Data Analysis
MMR: An algorithm for clustering categorical data using Rough Set Theory

Data & Knowledge Engineering
Top-Down Parameter-Free Clustering of High-Dimensional Categorical Data

IEEE Transactions on Knowledge and Data Engineering
Learning decision trees with taxonomy of propositionalized attributes

Pattern Recognition
Spectral Embedding of Feature Hypergraphs

SSPR & SPR '08 Proceedings of the 2008 Joint IAPR International Workshop on Structural, Syntactic, and Statistical Pattern Recognition
Multiobjective genetic algorithm-based fuzzy clustering of categorical attributes

IEEE Transactions on Evolutionary Computation
A rough set approach for selecting clustering attribute

Knowledge-Based Systems
Electricity based external similarity of categorical attributes

PAKDD'03 Proceedings of the 7th Pacific-Asia conference on Advances in knowledge discovery and data mining
A polynomial characterization of hypergraphs using the Ihara zeta function

Pattern Recognition
Propositionalized attribute taxonomies from data for data-driven construction of concise classifiers

Expert Systems with Applications: An International Journal
Aggregate distance based clustering using fibonacci series-FIBCLUS

APWeb'11 Proceedings of the 13th Asia-Pacific web conference on Web technologies and applications
INCONCO: interpretable clustering of numerical and categorical objects

Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
DISC: data-intensive similarity measure for categorical data

PAKDD'11 Proceedings of the 15th Pacific-Asia conference on Advances in knowledge discovery and data mining - Volume Part II
Alternative clustering by utilizing multi-objective genetic algorithm with linked-list based chromosome encoding

MLDM'05 Proceedings of the 4th international conference on Machine Learning and Data Mining in Pattern Recognition
Multinomial event model based abstraction for sequence and text classification

SARA'05 Proceedings of the 6th international conference on Abstraction, Reformulation and Approximation
Clustering structured web sources: a schema-based, model-differentiation approach

EDBT'04 Proceedings of the 2004 international conference on Current Trends in Database Technology
Clustering of heterogeneously typed data with soft computing - a case study

MICAI'11 Proceedings of the 10th international conference on Artificial Intelligence: advances in Soft Computing - Volume Part II
Hypergraph based information-theoretic feature selection

Pattern Recognition Letters
MAR: Maximum Attribute Relative of soft set for clustering attribute selection

Knowledge-Based Systems
iHypR: Prominence ranking in networks of collaborations with hyperedges1

ACM Transactions on Knowledge Discovery from Data (TKDD)

Quantified Score

Hi-index	0.00

Visualization

Abstract

We describe a novel approach for clustering collections of sets, and its application to the analysis and mining of categorical data. By “categorical data,” we mean tables with fields that cannot be naturally ordered by a metric – e.g., the names of producers of automobiles, or the names of products offered by a manufacturer. Our approach is based on an iterative method for assigning and propagating weights on the categorical values in a table; this facilitates a type of similarity measure arising from the co-occurrence of values in the dataset. Our techniques can be studied analytically in terms of certain types of non-linear dynamical systems.