Tractable Group Detection on Large Link Data Sets

  • Authors:
  • Jeremy Kubica;Andrew Moore;Jeff Schneider

  • Affiliations:
  • -;-;-

  • Venue:
  • ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
  • Year:
  • 2003

Quantified Score

Hi-index 0.00

Visualization

Abstract

Discovering underlying structure from co-occurrencedata is an important task in a variety of fields, including:insurance, intelligence, criminal investigation, epidemiology,human resources, and marketing.Previously Kubicaet. al. presented the group detection algorithm (GDA) - analgorithm for finding underlying groupings of entities fromco-occurrence data.This algorithm is based on a probabilisticgenerative model and produces coherent groups thatare consistent with prior knowledge.Unfortunately, the optimizationused in GDA is slow, potentially making it infeasiblefor many large data sets.To this end, we present k-groups - an algorithm that uses an approach similar tothat of k-means to significantly acclerate the discovery ofgroups while retaining GDA's probabilistic model.We comparethe performance of GDA and k-groups on a variety ofdata, showing that k-groups' sacrifice in solution quality issignificantly offset by its increase in speed.