Algorithms for clustering high dimensional and distributed data

Authors:
Tao Li;Shenghuo Zhu;Mitsunori Ogihara
Affiliations:
(Correspd. Tel.: +1 585 275 8479/ Fax: +1 585 273 4556/ E-mail: taoli@cs.rochester.edu) Computer Science Department, University of Rochester, Rochester, NY 14627-0226, USA. E-mail: {taoli,zsh,ogih ...;Computer Science Department, University of Rochester, Rochester, NY 14627-0226, USA. E-mail: {taoli,zsh,ogihara}@cs.rochester.edu;Computer Science Department, University of Rochester, Rochester, NY 14627-0226, USA. E-mail: {taoli,zsh,ogihara}@cs.rochester.edu
Venue:
Intelligent Data Analysis
Year:
2003

Citing 33
Cited 9

Algorithms for clustering data

Algorithms for clustering data
Mining association rules between sets of items in large databases

SIGMOD '93 Proceedings of the 1993 ACM SIGMOD international conference on Management of data
Mining quantitative association rules in large relational tables

SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
BIRCH: an efficient data clustering method for very large databases

SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
A maximum entropy approach to natural language processing

Computational Linguistics
Scalable parallel data mining for association rules

SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
Bayesian classification (AutoClass): theory and results

Advances in knowledge discovery and data mining
Fast discovery of association rules

Advances in knowledge discovery and data mining
Two algorithms for nearest-neighbor search in high dimensions

STOC '97 Proceedings of the twenty-ninth annual ACM symposium on Theory of computing
CURE: an efficient clustering algorithm for large databases

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Automatic subspace clustering of high dimensional data for data mining applications

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
WebACE: a Web agent for document categorization and exploration

AGENTS '98 Proceedings of the second international conference on Autonomous agents
An information-theoretic analysis of hard and soft assignment methods for clustering

Proceedings of the NATO Advanced Study Institute on Learning in graphical models
Fast algorithms for projected clustering

SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Entropy-based subspace clustering for mining numerical data

KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
CACTUS—clustering categorical data using summaries

KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Mining the most interesting rules

KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Mining frequent patterns without candidate generation

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Document Categorization and Query Generation on the World Wide WebUsing WebACE

Artificial Intelligence Review - Special issue on data mining on the Internet
Very fast EM-based mixture model clustering using multiresolution kd-trees

Proceedings of the 1998 conference on Advances in neural information processing systems II
Distributed and parallel knowledge discovery (workshop session) (title only)

Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
ROCK: a robust clustering algorithm for categorical attributes

Information Systems
Algorithms for association rule mining — a general survey and comparison

ACM SIGKDD Explorations Newsletter
Context-specific Bayesian clustering for gene expression data

RECOMB '01 Proceedings of the fifth annual international conference on Computational biology
Distributed data clustering can be efficient and exact

ACM SIGKDD Explorations Newsletter - Special issue on “Scalable data mining algorithms”
A fast distributed algorithm for mining association rules

DIS '96 Proceedings of the fourth international conference on on Parallel and distributed information systems
Probabilistic modeling of transaction data with applications to profiling, visualization, and prediction

Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
Clustering Algorithms

Clustering Algorithms
Graphical Models: Foundations of Neural Computation

Graphical Models: Foundations of Neural Computation
Parallel Algorithms for Discovery of Association Rules

Data Mining and Knowledge Discovery
Finding Interesting Associations without Support Pruning

IEEE Transactions on Knowledge and Data Engineering
Fast Algorithms for Mining Association Rules in Large Databases

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Co-clustering Documents and Words Using Bipartite Spectral GraphPartitioning

Co-clustering Documents and Words Using Bipartite Spectral GraphPartitioning

Subspace clustering for high dimensional data: a review

ACM SIGKDD Explorations Newsletter - Special issue on learning from imbalanced datasets
On combining multiple clusterings

Proceedings of the thirteenth ACM international conference on Information and knowledge management
Meta methods for model sharing in personal information systems

ACM Transactions on Information Systems (TOIS)
A comprehensive validity index for clustering

Intelligent Data Analysis
Cluster domains in binary minimization problems

ICANN'07 Proceedings of the 17th international conference on Artificial neural networks
Data clustering with size constraints

Knowledge-Based Systems
On combining multiple clusterings: an overview and a new perspective

Applied Intelligence
Automatic document organization in a p2p environment

ECIR'06 Proceedings of the 28th European conference on Advances in Information Retrieval
Tensor clustering via adaptive subspace iteration

Intelligent Data Analysis

Quantified Score

Hi-index	0.00

Visualization

Abstract

Clustering is the problem of identifying the distribution of patterns and intrinsic correlations in large data sets by partitioning the data points into similarity classes. The clustering problem has been widely studied in machine learning, databases, and statistics. This paper studies the problem of clustering high dimensional data. The paper proposes an algorithm called the CoFD algorithm, which is a non-distance based clustering algorithm for high dimensional spaces. Based on the Maximum Likelihood Principle, CoFD attempts to optimize its parameter settings to maximize the likelihood between data points and the model generated by the parameters. The distributed versions of the problem, called the D-CoFD algorithms, are also proposed. Experimental results on both synthetic and real data sets show the efficiency and effectiveness of CoFD and D-CoFD algorithms.