Projective clustering using itemset discovery for multi-dimensional data analysis

Authors:
Muhammad Umer Arshad;Muhammad Naeem Ayyaz
Affiliations:
Department of Electrical Engineering, University of Engineering and Technology, Lahore, Pakistan;Department of Electrical Engineering, University of Engineering and Technology, Lahore, Pakistan
Venue:
MS'06 Proceedings of the 17th IASTED international conference on Modelling and simulation
Year:
2006

Citing 16
Cited 0

Automatic subspace clustering of high dimensional data for data mining applications

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
OPTICS: ordering points to identify the clustering structure

SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Fast algorithms for projected clustering

SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Data clustering: a review

ACM Computing Surveys (CSUR)
Mining frequent patterns without candidate generation

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Finding generalized projected clusters in high dimensional spaces

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Data mining: concepts and techniques

Data mining: concepts and techniques
Outlier detection for high dimensional data

SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
A Monte Carlo algorithm for fast projective clustering

Proceedings of the 2002 ACM SIGMOD international conference on Management of data
MAFIA: A Maximal Frequent Itemset Algorithm for Transactional Databases

Proceedings of the 17th International Conference on Data Engineering
A Tight Upper Bound on the Number of Candidate Patterns

ICDM '01 Proceedings of the 2001 IEEE International Conference on Data Mining
Fast Algorithms for Mining Association Rules in Large Databases

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Frequent-Pattern based Iterative Projected Clustering

ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
Outlier Mining in Large High-Dimensional Data Sets

IEEE Transactions on Knowledge and Data Engineering
Compression, Clustering, and Pattern Discovery in Very High-Dimensional Discrete-Attribute Data Sets

IEEE Transactions on Knowledge and Data Engineering
k-means projective clustering

PODS '04 Proceedings of the twenty-third ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Clustering, also known as unsupervised classification, aims at grouping data such that intra-group distances are minimized and inter-group distances are maximized. Most of the clustering algorithms use full dimensions of the feature/attribute space for partitioning objects into different groups. However, recent research suggests that clustering for high-dimensional spaces should search for hidden subspaces with lower dimensionalities, because it is more likely for data to form dense clusters in a high-dimensional subspace. In this paper, we present a new, fast, and scalable clustering algorithm, ProjClusID, for the projective clustering problem. We use the concept of frequent itemset mining to find projective clusters. For this, we use discretization to map data from continuous to discrete domain. Our algorithm is density-based and grid-based and finds the potential optimum clustering without requiring any parameter input. As a post-clustering step, the data is mapped back to its original continuous domain. Our experimental results on synthetic and real datasets show that ProjClusID algorithm improves on the accuracy and effectiveness of the previous techniques.