AutoPart: parameter-free graph partitioning and outlier detection

Authors:
Deepayan Chakrabarti
Affiliations:
Carnegie Mellon University
Venue:
PKDD '04 Proceedings of the 8th European Conference on Principles and Practice of Knowledge Discovery in Databases
Year:
2004

Citing 0
Cited 24

Neighborhood Formation and Anomaly Detection in Bipartite Graphs

ICDM '05 Proceedings of the Fifth IEEE International Conference on Data Mining
Relevance search and anomaly detection in bipartite graphs

ACM SIGKDD Explorations Newsletter
Graph mining: Laws, generators, and algorithms

ACM Computing Surveys (CSUR)
GraphScope: parameter-free mining of large time-evolving graphs

Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
Anomaly detection in data represented as graphs

Intelligent Data Analysis
Graph summarization with bounded error

Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Evaluating use of data flow systems for large graph analysis

Proceedings of the 2nd Workshop on Many-Task Computing on Grids and Supercomputers
Finding the k-Most Abnormal Subgraphs from a Single Graph

DS '09 Proceedings of the 12th International Conference on Discovery Science
Metric forensics: a multi-level approach for mining volatile graphs

Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining
Information theoretic criteria for community detection

SNAKDD'08 Proceedings of the Second international conference on Advances in social network mining and analysis
A parameter-free method for discovering generalized clusters in a network

DS'11 Proceedings of the 14th international conference on Discovery science
Mining outliers in spatial networks

DASFAA'06 Proceedings of the 11th international conference on Database Systems for Advanced Applications
Ranking outliers using symmetric neighborhood relationship

PAKDD'06 Proceedings of the 10th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining
Discovering burst areas in fast evolving graphs

DASFAA'10 Proceedings of the 15th international conference on Database Systems for Advanced Applications - Volume Part I
OddBall: spotting anomalies in weighted graphs

PAKDD'10 Proceedings of the 14th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining - Volume Part II
Hierarchical clustering and outlier detection for effective image data organization

Proceedings of the 6th International Conference on Ubiquitous Information Management and Communication
Community detection in Social Media

Data Mining and Knowledge Discovery
Non-negative residual matrix factorization: problem definition, fast solutions, and applications

Statistical Analysis and Data Mining
MultiAspectForensics: mining large heterogeneous networks using tensor

International Journal of Web Engineering and Technology
Autonomously reviewing and validating the knowledge base of a never-ending learning system

Proceedings of the 22nd international conference on World Wide Web companion
On detecting association-based clique outliers in heterogeneous information networks

Proceedings of the 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining
Locating emergencies in a campus using wi-fi access point association data

Proceedings of the 2013 ACM conference on Pervasive and ubiquitous computing adjunct publication
Discovery of extreme events-related communities in contrasting groups of physical system networks

Data Mining and Knowledge Discovery
RoClust: Role discovery for graph clustering

Web Intelligence and Agent Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Graphs arise in numerous applications, such as the analysis of the Web, router networks, social networks, co-citation graphs, etc. Virtually all the popular methods for analyzing such graphs, for example, k-means clustering, METIS graph partitioning and SVD/PCA, require the user to specify various parameters such as the number of clusters, number of partitions and number of principal components. We propose a novel way to group nodes, using information-theoretic principles to choose both the number of such groups and the mapping from nodes to groups. Our algorithm is completely parameter-free, and also scales practically linearly with the problem size. Further, we propose novel algorithms which use this node group structure to get further insights into the data, by finding outliers and computing distances between groups. Finally, we present experiments on multiple synthetic and real-life datasets, where our methods give excellent, intuitive results.