Clustering in applications with multiple data sources-A mutual subspace clustering approach

Authors:
Ming Hua;Jian Pei
Affiliations:
Facebook Inc., Palo Alto, CA, USA;Simon Fraser University, Burnaby, BC, Canada
Venue:
Neurocomputing
Year:
2012

Citing 18
Cited 3

Automatic subspace clustering of high dimensional data for data mining applications

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Fast algorithms for projected clustering

SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Entropy-based subspace clustering for mining numerical data

KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Finding generalized projected clusters in high dimensional spaces

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Clustering through decision tree construction

Proceedings of the ninth international conference on Information and knowledge management
A new cell-based clustering method for large, high-dimensional data in data mining applications

Proceedings of the 2002 ACM symposium on Applied computing
Clustering by pattern similarity in large data sets

Proceedings of the 2002 ACM SIGMOD international conference on Management of data
A Monte Carlo algorithm for fast projective clustering

Proceedings of the 2002 ACM SIGMOD international conference on Management of data
d-Clusters: Capturing Subspace Correlation in a Large Data Set

ICDE '02 Proceedings of the 18th International Conference on Data Engineering
MaPle: A Fast Algorithm for Maximal Pattern-based Clustering

ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
Multi-relational data mining: an introduction

ACM SIGKDD Explorations Newsletter
Subspace clustering for high dimensional data: a review

ACM SIGKDD Explorations Newsletter - Special issue on learning from imbalanced datasets
Dual Clustering: Integrating Data Clustering over Optimization and Constraint Domains

IEEE Transactions on Knowledge and Data Engineering
On mining cross-graph quasi-cliques

Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
Multi-view clustering via canonical correlation analysis

ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Active learning with multiple views

Journal of Artificial Intelligence Research
Multiview spectral embedding

IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics
m-SNE: multiview stochastic neighbor embedding

ICONIP'10 Proceedings of the 17th international conference on Neural information processing: theory and algorithms - Volume Part I

Locality mutual clustering for document retrieval

Proceedings of the 8th International Conference on Ubiquitous Information Management and Communication
Discriminative extended canonical correlation analysis for pattern set matching

Machine Learning
Quality of information-based source assessment and selection

Neurocomputing

Quantified Score

Hi-index	0.01

Visualization

Abstract

In many applications, such as bioinformatics and cross-market customer relationship management, there are data from multiple sources jointly describing the same set of objects. An important data mining task is to find interesting groups of objects that form clusters in subspaces of the data sources jointly supported by those data sources. In this paper, we study a novel problem of mining mutual subspace clusters from multiple sources. We develop two interesting models and the corresponding methods for mutual subspace clustering. The density-based model identifies dense regions in subspaces as clusters. The bottom-up method searches for density-based mutual subspace clusters systematically from low-dimensional subspaces to high-dimensional ones. The partitioning model divides points in a data set into k exclusive clusters and a signature subspace is found for each cluster, where k is the number of clusters desired by a user. The top-down method interleaves the well-known k-means clustering procedures in multiple sources. We use experimental results on synthetic data sets and real data sets to report the effectiveness and the efficiency of the methods.