Expanding network communities from representative examples

Authors:
Andrew Mehler;Steven Skiena
Affiliations:
Stony Brook University, Stony Brook, NY;Stony Brook University, Stony Brook, NY
Venue:
ACM Transactions on Knowledge Discovery from Data (TKDD)
Year:
2009

Citing 17
Cited 3

Partitioning sparse matrices with eigenvectors of graphs

SIAM Journal on Matrix Analysis and Applications
Inferring Web communities from link topology

Proceedings of the ninth ACM conference on Hypertext and hypermedia : links, objects, time and space---structure in hypermedia systems: links, objects, time and space---structure in hypermedia systems
Authoritative sources in a hyperlinked environment

Journal of the ACM (JACM)
Efficient identification of Web communities

Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
Linked

Linked
Natural communities in large linked networks

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Email as spectroscopy: automated discovery of community structure within organizations

Communities and technologies
Dynamic social network analysis using latent space models

ACM SIGKDD Explorations Newsletter
A bootstrapping method for learning semantic lexicons using extraction pattern contexts

EMNLP '02 Proceedings of the ACL-02 conference on Empirical methods in natural language processing - Volume 10
Communities from seed sets

Proceedings of the 15th international conference on World Wide Web
Group formation in large social networks: membership, growth, and evolution

Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Spatial Analysis of News Sources

IEEE Transactions on Visualization and Computer Graphics
"More like these": growing entity classes from seeds

Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
Concordance-Based Entity-Oriented Search

WI '07 Proceedings of the IEEE/WIC/ACM International Conference on Web Intelligence
Statistical properties of community structure in large social and information networks

Proceedings of the 17th international conference on World Wide Web
Identifying co-referential names across large corpora

CPM'06 Proceedings of the 17th Annual conference on Combinatorial Pattern Matching
Lydia: a system for large-scale news analysis

SPIRE'05 Proceedings of the 12th international conference on String Processing and Information Retrieval

Name-ethnicity classification from open sources

Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
Identifying Differences in News Coverage between Cultural/Ethnic Groups

WI-IAT '09 Proceedings of the 2009 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology - Volume 03
Sampling community structure

Proceedings of the 19th international conference on World wide web

Quantified Score

Hi-index	0.00

Visualization

Abstract

We present an approach to leverage a small subset of a coherent community within a social network into a much larger, more representative sample. Our problem becomes identifying a small conductance subgraph containing many (but not necessarily all) members of the given seed set. Starting with an initial seed set representing a sample of a community, we seek to discover as much of the full community as possible. We present a general method for network community expansion, demonstrating that our methods work well in expanding communities in real world networks starting from small given seed groups (20 to 400 members). Our approach is marked by incremental expansion from the seeds with retrospective analysis to determine the ultimate boundaries of our community. We demonstrate how to increase the robustness of the general approach through bootstrapping multiple random partitions of the input set into seed and evaluation groups. We go beyond statistical comparisons against gold standards to careful subjective evaluations of our expanded communities. This process explains the causes of most disagreement between our expanded communities and our gold-standards—arguing that our expansion methods provide more reliable communities than can be extracted from reference sources/gazetteers such as Wikipedia.