Cluster Ranking with an Application to Mining Mailbox Networks

  • Authors:
  • Ziv Bar-Yossef;Ido Guy;Ronny Lempel;Yoelle S. Maarek;Vladimir Soroka

  • Affiliations:
  • Technion and Google Inc., Israel;Technion and IBM Research Lab, Israel;IBM Research Lab, Israel;Google Inc., Israel;IBM Research Lab, Israel

  • Venue:
  • ICDM '06 Proceedings of the Sixth International Conference on Data Mining
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

We initiate the study of a new clustering framework, called cluster ranking. Rather than simply partitioning a network into clusters, a cluster ranking algorithm also orders the clusters by their strength. To this end, we introduce a novel strength measure for clusters--the integrated cohesion--which is applicable to arbitrary weighted networks. We then present C-Rank: a new cluster ranking algorithm. Given a network with arbitrary pairwise similarity weights, C-Rank creates a list of overlapping clusters and ranks them by their integrated cohesion. We provide extensive theoretical and empirical analysis of C-Rank and show that it is likely to have high precision and recall. Our experiments focus on mining mailbox networks. A mailbox network is an egocentric social network, consisting of contacts with whom an individual exchanges email. Ties among contacts are represented by the frequency of their co-occurrence on message headers. C-Rank is well suited to mine such networks, since they are abundant with overlapping communities of highly variable strengths. We demonstrate the effectiveness of C-Rank on the Enron data set, consisting of 130 mailbox networks.