LinkClus: efficient clustering via heterogeneous semantic links

Authors:
Xiaoxin Yin;Jiawei Han;Philip S. Yu
Affiliations:
UIUC;UIUC;IBM T. J. Watson Res. Center
Venue:
VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Year:
2006

Citing 18
Cited 34

Mining association rules between sets of items in large databases

SIGMOD '93 Proceedings of the 1993 ACM SIGMOD international conference on Management of data
BIRCH: an efficient data clustering method for very large databases

SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
CURE: an efficient clustering algorithm for large databases

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Fast algorithms for projected clustering

SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
On power-law relationships of the Internet topology

Proceedings of the conference on Applications, technologies, architectures, and protocols for computer communication
Biclustering of Expression Data

Proceedings of the Eighth International Conference on Intelligent Systems for Molecular Biology
Efficient and Effective Clustering Methods for Spatial Data Mining

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Relational Distance-Based Clustering

ILP '98 Proceedings of the 8th International Workshop on Inductive Logic Programming
SimRank: a measure of structural-context similarity

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Mining Top.K Frequent Closed Patterns without Minimum Support

ICDM '02 Proceedings of the 2002 IEEE International Conference on Data Mining
ReCoM: reinforcement clustering of multi-type interrelated data objects

Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Information-theoretic co-clustering

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
CLOSET+: searching for the best strategies for mining frequent closed itemsets

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Fully automatic cross-associations

Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Scaling link-based similarity search

WWW '05 Proceedings of the 14th international conference on World Wide Web
Cross-relational clustering with user's guidance

Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
Introduction to Data Mining, (First Edition)

Introduction to Data Mining, (First Edition)
Multi-way distributional clustering via pairwise interactions

ICML '05 Proceedings of the 22nd international conference on Machine learning

A probabilistic framework for relational clustering

Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
Clustering as an approach to support the automatic definition of semantic hyperlinks

Proceedings of the eighteenth conference on Hypertext and hypermedia
Diva: a variance-based clustering approach for multi-type relational data

Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
Structure-based inference of xml similarity for fuzzy duplicate detection

Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
DataScope: viewing database contents in Google Maps' way

VLDB '07 Proceedings of the 33rd international conference on Very large data bases
BibNetMiner: mining bibliographic information networks

Proceedings of the 2008 ACM SIGMOD international conference on Management of data
S-SimRank: Combining Content and Link Information to Cluster Papers Effectively and Efficiently

ADMA '08 Proceedings of the 4th international conference on Advanced Data Mining and Applications
Scaling up duplicate detection in graph data

Proceedings of the 17th ACM conference on Information and knowledge management
RankClus: integrating clustering with ranking for heterogeneous information network analysis

Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology
Mining Research Communities in Bibliographical Data

Advances in Web Mining and Web Usage Analysis
An Adaptive Method for the Efficient Similarity Calculation

DASFAA '09 Proceedings of the 14th International Conference on Database Systems for Advanced Applications
Using Link-Based Content Analysis to Measure Document Similarity Effectively

APWeb/WAIM '09 Proceedings of the Joint International Conferences on Advances in Data and Web Management
Exploiting the Block Structure of Link Graph for Efficient Similarity Computation

PAKDD '09 Proceedings of the 13th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining
Scalable mining and link analysis across multiple database relations

ACM SIGKDD Explorations Newsletter
Exploiting Domain Knowledge by Automated Taxonomy Generation in Recommender Systems

EC-Web 2009 Proceedings of the 10th International Conference on E-Commerce and Web Technologies
Calculating Similarity Efficiently in a Small World

ADMA '09 Proceedings of the 5th International Conference on Advanced Data Mining and Applications
WisColl: Collective wisdom based blog clustering

Information Sciences: an International Journal
P-Rank: a comprehensive structural similarity measure over information networks

Proceedings of the 18th ACM conference on Information and knowledge management
Exploring the power of heuristics and links in multi-relational data mining

ISMIS'08 Proceedings of the 17th international conference on Foundations of intelligent systems
A fast two-stage algorithm for computing SimRank and its extensions

WAIM'10 Proceedings of the 2010 international conference on Web-age information management
Approximate entity extraction in temporal databases

World Wide Web
Efficient link-based clustering in a large scaled blog network

Proceedings of the 5th International Conference on Ubiquitous Information Management and Communication
A game theoretic framework for heterogenous information network clustering

Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
Axiomatic ranking of network role similarity

Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
Pairwise similarity calculation of information networks

DaWaK'11 Proceedings of the 13th international conference on Data warehousing and knowledge discovery
PAV: A novel model for ranking heterogeneous objects in bibliographic information networks

Expert Systems with Applications: An International Journal
Delta-SimRank computing on MapReduce

Proceedings of the 1st International Workshop on Big Data, Streams and Heterogeneous Source Mining: Algorithms, Systems, Programming Models and Applications
Hierarchical data organization for effective retrieval of similar shaders

Proceedings of the 2012 ACM Research in Applied Computation Symposium
A data partitioning approach for hierarchical clustering

Proceedings of the 7th International Conference on Ubiquitous Information Management and Communication
Towards scalable real-time entity resolution using a similarity-aware inverted index approach

AusDM '08 Proceedings of the 7th Australasian Data Mining Conference - Volume 87
E-rank: A Structural-Based Similarity Measure in Social Networks

WI-IAT '12 Proceedings of the The 2012 IEEE/WIC/ACM International Joint Conferences on Web Intelligence and Intelligent Agent Technology - Volume 01
From Frequent Features to Frequent Social Links

International Journal of Information System Modeling and Design
Scalable and axiomatic ranking of network role similarity

ACM Transactions on Knowledge Discovery from Data (TKDD) - Casin special issue
Assessing single-pair similarity over graphs by aggregating first-meeting probabilities

Information Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Data objects in a relational database are cross-linked with each other via multi-typed links. Links contain rich semantic information that may indicate important relationships among objects. Most current clustering methods rely only on the properties that belong to the objects per se. However, the similarities between objects are often indicated by the links, and desirable clusters cannot be generated using only the properties of objects.In this paper we explore linkage-based clustering, in which the similarity between two objects is measured based on the similarities between the objects linked with them. In comparison with a previous study (SimRank) that computes links recursively on all pairs of objects, we take advantage of the power law distribution of links, and develop a hierarchical structure called SimTree to represent similarities in multi-granularity manner. This method avoids the high cost of computing and storing pairwise similarities but still thoroughly explore relationships among objects. An efficient algorithm is proposed to compute similarities between objects by avoiding pairwise similarity computations through merging computations that go through the same branches in the SimTree. Experiments show the proposed approach achieves high efficiency, scalability, and accuracy in clustering multi-typed linked objects.