Algorithms for clustering data
Algorithms for clustering data
BIRCH: an efficient data clustering method for very large databases
SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
CURE: an efficient clustering algorithm for large databases
SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
OPTICS: ordering points to identify the clustering structure
SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Normalized Cuts and Image Segmentation
IEEE Transactions on Pattern Analysis and Machine Intelligence
ROCK: a robust clustering algorithm for categorical attributes
Information Systems
Neural Networks for Pattern Recognition
Neural Networks for Pattern Recognition
Machine Learning
Introduction to Modern Information Retrieval
Introduction to Modern Information Retrieval
Refining Initial Points for K-Means Clustering
ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
X-means: Extending K-means with Efficient Estimation of the Number of Clusters
ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
A Min-max Cut Algorithm for Graph Partitioning and Data Clustering
ICDM '01 Proceedings of the 2001 IEEE International Conference on Data Mining
Some simplified NP-complete problems
STOC '74 Proceedings of the sixth annual ACM symposium on Theory of computing
ACM Transactions on Knowledge Discovery from Data (TKDD)
Scalable community discovery on textual data with relations
Proceedings of the 17th ACM conference on Information and knowledge management
Clustering Data Streams in Optimization and Geography Domains
PAKDD '09 Proceedings of the 13th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining
JCCM: Joint Cluster Communities on Attribute and Relationship Data in Social Networks
ADMA '09 Proceedings of the 5th International Conference on Advanced Data Mining and Applications
Efficient joint clustering algorithms in optimization and geography domains
PAKDD'08 Proceedings of the 12th Pacific-Asia conference on Advances in knowledge discovery and data mining
Finding collections of k-clique percolated components in attributed graphs
PAKDD'12 Proceedings of the 16th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining - Volume Part II
Investigating the Properties of a Social Bookmarking and Tagging Network
International Journal of Data Warehousing and Mining
Combining Relations and Text in Scientific Network Clustering
ASONAM '12 Proceedings of the 2012 International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2012)
Hi-index | 0.00 |
In many applications, attribute and relationship data areavailable, carrying complementary information about real world entities. In such cases, a joint analysis of both types of data can yield more accurate results than classical clustering algorithms that either use only attribute data or only relationship (graph) data. The Connected k-Center (CkC) has been proposed as the first joint cluster analysis model to discover k clusters which are cohesive on both attribute and relationship data. However, it is well-known that prior knowledge on the number of clusters is often unavailable in applications such as community dentification and hotspot analysis. In this paper, we introduce and formalize the problem of discovering an a-priori unspecified number of clusters in the context of joint cluster analysis of attribute and relationship data, called Connected X Clusters (CXC) problem. True clusters are assumed to be compact and distinctive from their neighboring clusters in terms of attribute data and internally connected in terms of relationship data. Different from classical attribute-based clustering methods, the neighborhood of clusters is not defined in terms of attribute data but in terms of relationship data. To efficiently solve the CXC problem, we present JointClust, an algorithm which adopts a dynamic two-phase approach. In the first phase, we find so called cluster atoms. We provide a probability analysis for thisphase, which gives us a probabilistic guarantee, that each true cluster is represented by at least one of the initial cluster atoms. In the second phase, these cluster atoms are merged in a bottom-up manner resulting in a dendrogram. The final clustering is determined by our objective function. Our experimental evaluation on several real datasets demonstrates that JointClust indeed discovers meaningful and accurate clusterings without requiring the user to specify the number of clusters.