Sampling algorithms for pure network topologies: a study on the stability and the separability of metric embeddings

  • Authors:
  • Edoardo M. Airoldi;Kathleen M. Carley

  • Affiliations:
  • Carnegie Mellon University, Pittsburgh, PA;Carnegie Mellon University, Pittsburgh, PA

  • Venue:
  • ACM SIGKDD Explorations Newsletter
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

In a time of information glut, observations about complex systems and phenomena of interest are available in several applications areas, such as biology and text. As a consequence, scientists have started searching for patterns that involve interactions among the objects of analysis, to the effect that research on models and algorithms for network analysis has become a central theme for knowledge discovery and data mining (KDD). The intuitions behind the plethora of approaches rely upon few basic types of networks, identified by specific local and global topological properties, which we term "pure" topology types.In this paper, (1) we survey pure topology types along with existing sampling algorithms that generate them, (2) we introduce novel algorithms that enhance the diversity of samples, and address the case of cellular topologies, (3) we perform statistical studies of the stability of the properties of pure types to alternative generative algorithms, and a joint study of the separability of pure types, in terms of their embedding in a space of metrics for network analysis, widely adopted in the social and physical sciences.We conclude with a word of caution to the practitioners, who sample pure topology types to assess the "statistical significance" of their findings, e.g., the p-value of the clustering coefficient is sensitive to the sampling algorithm used. We find that different pure types share similar topological properties. Further, real world networks hardly present the variability profile of a single pure type. We suggest the assumption of "mixtures of types" as an alternative starting point for developing models and algorithms for network analysis.