Sampling algorithms for pure network topologies: a study on the stability and the separability of metric embeddings

Authors:
Edoardo M. Airoldi;Kathleen M. Carley
Affiliations:
Carnegie Mellon University, Pittsburgh, PA;Carnegie Mellon University, Pittsburgh, PA
Venue:
ACM SIGKDD Explorations Newsletter
Year:
2005

Citing 14
Cited 9

On power-law relationships of the Internet topology

Proceedings of the conference on Applications, technologies, architectures, and protocols for computer communication
Graph-Based Data Mining

IEEE Intelligent Systems
Learning probabilistic models of link structure

The Journal of Machine Learning Research
Maximizing the spread of influence through a social network

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Prospects and challenges for multi-relational data mining

ACM SIGKDD Explorations Newsletter
Link mining: a new data mining challenge

ACM SIGKDD Explorations Newsletter
Graph-based relational learning: current and future directions

ACM SIGKDD Explorations Newsletter
Why collective inference improves relational classification

Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Graphs over time: densification laws, shrinking diameters and possible explanations

Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
Group and topic discovery from relations and text

Proceedings of the 3rd international workshop on Link discovery
A latent mixed membership model for relational data

Proceedings of the 3rd international workshop on Link discovery
Probabilistic classification and clustering in relational data

IJCAI'01 Proceedings of the 17th international joint conference on Artificial intelligence - Volume 2
Link analysis, eigenvectors and stability

IJCAI'01 Proceedings of the 17th international joint conference on Artificial intelligence - Volume 2
The web as a graph: measurements, models, and methods

COCOON'99 Proceedings of the 5th annual international conference on Computing and combinatorics

Link mining: a survey

ACM SIGKDD Explorations Newsletter
Sampling from large graphs

Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Measuring the effects of preprocessing decisions and network forces in dynamic network analysis

Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
Robustness of centrality measures under uncertainty: Examining the role of network topology

Computational & Mathematical Organization Theory
A Survey of Statistical Network Models

Foundations and Trends® in Machine Learning
Distance distribution and average shortest path length estimation in real-world networks

ADMA'10 Proceedings of the 6th international conference on Advanced data mining and applications: Part I
Network sampling and classification: An investigation of network model representations

Decision Support Systems
Analyzing scientific networks for nuclear capabilities assessment

Journal of the American Society for Information Science and Technology
Bridge analysis in a Social Internetworking Scenario

Information Sciences: an International Journal

Quantified Score

Hi-index	0.00

Visualization

Abstract

In a time of information glut, observations about complex systems and phenomena of interest are available in several applications areas, such as biology and text. As a consequence, scientists have started searching for patterns that involve interactions among the objects of analysis, to the effect that research on models and algorithms for network analysis has become a central theme for knowledge discovery and data mining (KDD). The intuitions behind the plethora of approaches rely upon few basic types of networks, identified by specific local and global topological properties, which we term "pure" topology types.In this paper, (1) we survey pure topology types along with existing sampling algorithms that generate them, (2) we introduce novel algorithms that enhance the diversity of samples, and address the case of cellular topologies, (3) we perform statistical studies of the stability of the properties of pure types to alternative generative algorithms, and a joint study of the separability of pure types, in terms of their embedding in a space of metrics for network analysis, widely adopted in the social and physical sciences.We conclude with a word of caution to the practitioners, who sample pure topology types to assess the "statistical significance" of their findings, e.g., the p-value of the clustering coefficient is sensitive to the sampling algorithm used. We find that different pure types share similar topological properties. Further, real world networks hardly present the variability profile of a single pure type. We suggest the assumption of "mixtures of types" as an alternative starting point for developing models and algorithms for network analysis.