Redefining Clustering for High-Dimensional Applications

Authors:
C. C. Aggarwal;P. S. Yu
Affiliations:
-;-
Venue:
IEEE Transactions on Knowledge and Data Engineering
Year:
2002

Citing 31
Cited 25

Algorithms for clustering data

Algorithms for clustering data
FastMap: a fast algorithm for indexing, data-mining and visualization of traditional and multimedia datasets

SIGMOD '95 Proceedings of the 1995 ACM SIGMOD international conference on Management of data
BIRCH: an efficient data clustering method for very large databases

SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
Two algorithms for nearest-neighbor search in high dimensions

STOC '97 Proceedings of the twenty-ninth annual ACM symposium on Theory of computing
CURE: an efficient clustering algorithm for large databases

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Automatic subspace clustering of high dimensional data for data mining applications

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Dimensionality reduction for similarity searching in dynamic databases

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Approximate nearest neighbors: towards removing the curse of dimensionality

STOC '98 Proceedings of the thirtieth annual ACM symposium on Theory of computing
A comparative study of clustering methods

Future Generation Computer Systems - Special double issue on data mining
Clustering and singular value decomposition for approximate indexing in high dimensional spaces

Proceedings of the seventh international conference on Information and knowledge management
Snakes and sandwiches: optimal clustering strategies for a data warehouse

SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
OPTICS: ordering points to identify the clustering structure

SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Fast algorithms for projected clustering

SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Entropy-based subspace clustering for mining numerical data

KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
CACTUS—clustering categorical data using summaries

KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Finding generalized projected clusters in high dimensional spaces

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Fuzzy Models and Algorithms for Pattern Recognition and Image Processing

Fuzzy Models and Algorithms for Pattern Recognition and Image Processing
Extensions to the k-Means Algorithm for Clustering Large Data Sets with Categorical Values

Data Mining and Knowledge Discovery
A Distribution-Based Clustering Algorithm for Mining in Large Spatial Databases

ICDE '98 Proceedings of the Fourteenth International Conference on Data Engineering
Incremental Clustering for Mining in a Data Warehousing Environment

VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
Clustering Categorical Data: An Approach Based on Dynamical Systems

VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
Optimal Grid-Clustering: Towards Breaking the Curse of Dimensionality in High-Dimensional Clustering

VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
Similarity Search in High Dimensions via Hashing

VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
Local Dimensionality Reduction: A New Approach to Indexing High Dimensional Spaces

VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
Efficient and Effective Clustering Methods for Spatial Data Mining

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Knowledge Discovery in Databases and Data Mining

ISMIS '96 Proceedings of the 9th International Symposium on Foundations of Intelligent Systems
Robust Clustering of Large Geo-referenced Data Sets

PAKDD '99 Proceedings of the Third Pacific-Asia Conference on Methodologies for Knowledge Discovery and Data Mining
The BANG-Clustering System: Grid-Based Data Analysis

IDA '97 Proceedings of the Second International Symposium on Advances in Intelligent Data Analysis, Reasoning about Data
Clustering Large Datasets in Arbitrary Metric Spaces

ICDE '99 Proceedings of the 15th International Conference on Data Engineering
Clustering Categorical Data

ICDE '00 Proceedings of the 16th International Conference on Data Engineering
Grid-Clustering: An Efficient Hierarchical Clustering Method for Very Large Data Sets

ICPR '96 Proceedings of the 13th International Conference on Pattern Recognition - Volume 2

Using emerging pattern based projected clustering and gene expression data for cancer detection

APBC '04 Proceedings of the second conference on Asia-Pacific bioinformatics - Volume 29
Projective Clustering by Histograms

IEEE Transactions on Knowledge and Data Engineering
Mining Quantitative Frequent Itemsets Using Adaptive Density-Based Subspace Clustering

ICDM '05 Proceedings of the Fifth IEEE International Conference on Data Mining
Projected clustering for categorical datasets

Pattern Recognition Letters
Analytically tractable case of fuzzy c-means clustering

Pattern Recognition
Shared farthest neighbor approach to clustering of high dimensionality, low cardinality data

Pattern Recognition
Towards Effective Visual Data Mining with Cooperative Approaches

Visual Data Mining
Letters: Soft ranking in clustering

Neurocomputing
A local semi-supervised Sammon algorithm for textual data visualization

Journal of Intelligent Information Systems
Clustering in the membership embedding space

International Journal of Knowledge Engineering and Soft Data Paradigms
Visualizing asymmetric proximities with SOM and MDS models

Neurocomputing
Membership embedding space approach and spectral clustering

KES'07/WIRN'07 Proceedings of the 11th international conference, KES 2007 and XVII Italian workshop on neural networks conference on Knowledge-based intelligent information and engineering systems: Part III
Scalable Clustering for Mining Local-Correlated Clusters in High Dimensions and Large Datasets

Fundamenta Informaticae - Intelligent Data Analysis in Granular Computing
Call to order: a hierarchical browsing approach to eliciting users' preference

Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Toward boosting distributed association rule mining by data de-clustering

Information Sciences: an International Journal
Discovering Knowledge-Sharing Communities in Question-Answering Forums

ACM Transactions on Knowledge Discovery from Data (TKDD)
Clustering very large multi-dimensional datasets with MapReduce

Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
Inter cluster distance management model with optimal centroid estimation for K-means clustering algorithm

WSEAS TRANSACTIONS on COMMUNICATIONS
A distributed knowledge extraction data mining algorithm

CIS'04 Proceedings of the First international conference on Computational and Information Science
Soft rank clustering

WIRN'05 Proceedings of the 16th Italian conference on Neural Nets
Visual interactive evolutionary algorithm for high dimensional outlier detection and data clustering problems

International Journal of Bio-Inspired Computation
Extending the SOM algorithm to visualize word relationships

IDA'05 Proceedings of the 6th international conference on Advances in Intelligent Data Analysis
Compression-aware I/O performance analysis for big data clustering

Proceedings of the 1st International Workshop on Big Data, Streams and Heterogeneous Source Mining: Algorithms, Systems, Programming Models and Applications
A survey on unsupervised outlier detection in high-dimensional numerical data

Statistical Analysis and Data Mining
Using Multidimensional Clustering Based Collaborative Filtering Approach Improving Recommendation Diversity

WI-IAT '12 Proceedings of the The 2012 IEEE/WIC/ACM International Joint Conferences on Web Intelligence and Intelligent Agent Technology - Volume 03

Quantified Score

Hi-index	0.00

Visualization

Abstract

Clustering problems are well-known in the database literature for their use in numerous applications, such as customer segmentation, classification, and trend analysis. High-dimensional data has always been a challenge for clustering algorithms because of the inherent sparsity of the points. Recent research results indicate that, in high-dimensional data, even the concept of proximity or clustering may not be meaningful. We introduce a very general concept of projected clustering which is able to construct clusters in arbitrarily aligned subspaces of lower dimensionality. The subspaces are specific to the clusters themselves. This definition is substantially more general and realistic than the currently available techniques which limit the method to only projections from the original set of attributes. The generalized projected clustering technique may also be viewed as a way of trying to redefine clustering for high-dimensional applications by searching for hidden subspaces with clusters which are created by interattribute correlations. We provide a new concept of using extended cluster feature vectors in order to make the algorithm scalable for very large databases. The running time and space requirements of the algorithm are adjustable and are likely to trade-off with better accuracy.