Cross-relational clustering with user's guidance

Authors:
Xiaoxin Yin;Jiawei Han;Philip S. Yu
Affiliations:
UIUC;UIUC;IBM T. J. Watson Res. Center
Venue:
Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
Year:
2005

Citing 15
Cited 15

Fast algorithms for projected clustering

SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
CACTUS—clustering categorical data using summaries

KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Finding generalized projected clusters in high dimensional spaces

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
A semi-supervised document clustering technique for information organization

Proceedings of the ninth international conference on Information and knowledge management
Unsupervised Feature Selection Using Feature Similarity

IEEE Transactions on Pattern Analysis and Machine Intelligence
Machine Learning

Machine Learning
Constrained K-means Clustering with Background Knowledge

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Correlation-based Feature Selection for Discrete and Numeric Class Machine Learning

ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Efficient and Effective Clustering Methods for Spatial Data Mining

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Relational Distance-Based Clustering

ILP '98 Proceedings of the 8th International Workshop on Inductive Logic Programming
An introduction to variable and feature selection

The Journal of Machine Learning Research
CrossMine: Efficient Classification Across Multiple Database Relations

ICDE '04 Proceedings of the 20th International Conference on Data Engineering
Feature Selection for Unsupervised Learning

The Journal of Machine Learning Research
Kernels and Distances for Structured Data

Machine Learning
Discover: keyword search in relational databases

VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases

LinkClus: efficient clustering via heterogeneous semantic links

VLDB '06 Proceedings of the 32nd international conference on Very large data bases
A probabilistic framework for relational clustering

Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
A rank algebra to support multimedia mining applications

Proceedings of the 8th international workshop on Multimedia data mining: (associated with the ACM SIGKDD 2007)
S-SimRank: Combining Content and Link Information to Cluster Papers Effectively and Efficiently

ADMA '08 Proceedings of the 4th international conference on Advanced Data Mining and Applications
Scalable mining and link analysis across multiple database relations

ACM SIGKDD Explorations Newsletter
Frequent Itemset Mining in Multirelational Databases

ISMIS '09 Proceedings of the 18th International Symposium on Foundations of Intelligent Systems
Mining induced and embedded subtrees in ordered, unordered, and partially-ordered trees

ISMIS'08 Proceedings of the 17th international conference on Foundations of intelligent systems
POTMiner: mining ordered, unordered, and partially-ordered trees

Knowledge and Information Systems
A general multi-relational classification approach using feature generation and selection

ADMA'10 Proceedings of the 6th international conference on Advanced data mining and applications - Volume Part II
A game theoretic framework for heterogenous information network clustering

Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
Diversified ranking on large graphs: an optimization viewpoint

Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
Using trees to mine multirelational databases

Data Mining and Knowledge Discovery
Using force-based graph layout for clustering of relational data

ADBIS'09 Proceedings of the 13th East European conference on Advances in Databases and Information Systems
Conceptual clustering of multi-relational data

ILP'11 Proceedings of the 21st international conference on Inductive Logic Programming
New approach for clustering relational data based on relationship and attribute information

ICANN'12 Proceedings of the 22nd international conference on Artificial Neural Networks and Machine Learning - Volume Part II

Quantified Score

Hi-index	0.00

Visualization

Abstract

Clustering is an essential data mining task with numerous applications. However, data in most real-life applications are high-dimensional in nature, and the related information often spreads across multiple relations. To ensure effective and efficient high-dimensional, cross-relational clustering, we propose a new approach, called CrossClus, which performs cross-relational clustering with user's guidance. We believe that user's guidance, even likely in very simple forms, could be essential for effective high-dimensional clustering since a user knows well the application requirements and data semantics. CrossClus is carried out as follows: A user specifies a clustering task and selects one or a small set of features pertinent to the task. CrossClus extracts the set of highly relevant features in multiple relations connected via linkages defined in the database schema, evaluates their effectiveness based on user's guidance, and identifies interesting clusters that fit user's needs. This method takes care of both quality in feature extraction and efficiency in clustering. Our comprehensive experiments demonstrate the effectiveness and scalability of this approach.