Tractable Group Detection on Large Link Data Sets

Authors:
Jeremy Kubica;Andrew Moore;Jeff Schneider
Affiliations:
-;-;-
Venue:
ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
Year:
2003

Citing 5
Cited 3

Text Classification from Labeled and Unlabeled Documents using EM

Machine Learning - Special issue on information retrieval
ROCK: a robust clustering algorithm for categorical attributes

Information Systems
Stochastic link and group detection

Eighteenth national conference on Artificial intelligence
Latent dirichlet allocation

The Journal of Machine Learning Research
SMEM Algorithm for Mixture Models

Neural Computation

Link mining: a survey

ACM SIGKDD Explorations Newsletter
Anomaly detection in extremist web forums using a dynamical systems approach

ACM SIGKDD Workshop on Intelligence and Security Informatics
DB-CSC: a density-based approach for subspace clustering in graphs with feature vectors

ECML PKDD'11 Proceedings of the 2011 European conference on Machine learning and knowledge discovery in databases - Volume Part I

Quantified Score

Hi-index	0.00

Visualization

Abstract

Discovering underlying structure from co-occurrencedata is an important task in a variety of fields, including:insurance, intelligence, criminal investigation, epidemiology,human resources, and marketing.Previously Kubicaet. al. presented the group detection algorithm (GDA) - analgorithm for finding underlying groupings of entities fromco-occurrence data.This algorithm is based on a probabilisticgenerative model and produces coherent groups thatare consistent with prior knowledge.Unfortunately, the optimizationused in GDA is slow, potentially making it infeasiblefor many large data sets.To this end, we present k-groups - an algorithm that uses an approach similar tothat of k-means to significantly acclerate the discovery ofgroups while retaining GDA's probabilistic model.We comparethe performance of GDA and k-groups on a variety ofdata, showing that k-groups' sacrifice in solution quality issignificantly offset by its increase in speed.