HSM: Heterogeneous Subspace Mining in High Dimensional Data

Authors:
Emmanuel Müller;Ira Assent;Thomas Seidl
Affiliations:
Data management and exploration group, RWTH Aachen University, Germany;Department of Computer Science, Aalborg University, Denmark;Data management and exploration group, RWTH Aachen University, Germany
Venue:
SSDBM 2009 Proceedings of the 21st International Conference on Scientific and Statistical Database Management
Year:
2009

Citing 13
Cited 3

Automatic subspace clustering of high dimensional data for data mining applications

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Mining frequent patterns without candidate generation

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Generating non-redundant association rules

Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
When Is ''Nearest Neighbor'' Meaningful?

ICDT '99 Proceedings of the 7th International Conference on Database Theory
Semantic Compression and Pattern Extraction with Fascicles

VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
Fast Algorithms for Mining Association Rules in Large Databases

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
SCHISM: A New Approach for Interesting Subspace Mining

ICDM '04 Proceedings of the Fourth IEEE International Conference on Data Mining
Data Mining: Concepts and Techniques

Data Mining: Concepts and Techniques
Clicks: An effective algorithm for mining subspace clusters in categorical datasets

Data & Knowledge Engineering
DUSC: Dimensionality Unbiased Subspace Clustering

ICDM '07 Proceedings of the 2007 Seventh IEEE International Conference on Data Mining
EDSC: efficient density-based subspace clustering

Proceedings of the 17th ACM conference on Information and knowledge management
INSCY: Indexing Subspace Clusters with In-Process-Removal of Redundancy

ICDM '08 Proceedings of the 2008 Eighth IEEE International Conference on Data Mining
OutRank: ranking outliers in high dimensional data

ICDEW '08 Proceedings of the 2008 IEEE 24th International Conference on Data Engineering Workshop

Subspace clustering

Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery
A survey on enhanced subspace clustering

Data Mining and Knowledge Discovery
Semi-supervised projected model-based clustering

Data Mining and Knowledge Discovery

Quantified Score

Hi-index	0.00

Visualization

Abstract

Heterogeneous data, i.e. data with both categorical and continuous values, is common in many databases. However, most data mining algorithms assume either continuous or categorical attributes, but not both. In high dimensional data, phenomena due to the "curse of dimensionality" pose additional challenges. Usually, due to locally varying relevance of attributes, patterns do not show across the full set of attributes. In this paper we propose HSM, which defines a new pattern model for heterogeneous high dimensional data. It allows data mining in arbitrary subsets of the attributes that are relevant for the respective patterns. Based on this model we propose an efficient algorithm, which is aware of the heterogeneity of the attributes. We extend an indexing structure for continuous attributes such that HSM indexing adapts to different attribute types. In our experiments we show that HSM efficiently mines patterns in arbitrary subspaces of heterogeneous high dimensional data.