Analysis of on-line social networks represented as graphs --- extraction of an approximation of community structure using sampling

Authors:
Néstor Martínez Arqué;David F. Nettleton
Affiliations:
Dept. Information Technology and Communications, Universitat Pompeu Fabra, Barcelona, Spain;Dept. Information Technology and Communications, Universitat Pompeu Fabra, Barcelona, Spain,IIIA-CSIC, Bellaterra, Spain
Venue:
MDAI'12 Proceedings of the 9th international conference on Modeling Decisions for Artificial Intelligence
Year:
2012

Citing 7
Cited 0

Graph mining: Laws, generators, and algorithms

ACM Computing Surveys (CSUR)
Discovering important nodes through graph entropy the case of Enron email database

Proceedings of the 3rd international workshop on Link discovery
Analysis of topological characteristics of huge online social networking services

Proceedings of the 16th international conference on World Wide Web
Challenges in mining social network data: processes, privacy, and paradoxes

Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
Measurement and analysis of online social networks

Proceedings of the 7th ACM SIGCOMM conference on Internet measurement
On the evolution of user interaction in Facebook

Proceedings of the 2nd ACM workshop on Online social networks
Comparing Community Structure to Characteristics in Online Collegiate Social Networks

SIAM Review

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper we benchmark two distinct algorithms for extracting community structure from social networks represented as graphs, considering how we can representatively sample an OSN graph while maintaining its community structure. We also evaluate the extraction algorithms' optimum value (modularity) for the number of communities using five well-known benchmarking datasets, two of which represent real online OSN data. Also we consider the assignment of the filtering and sampling criteria for each dataset. We find that the extraction algorithms work well for finding the major communities in the original and the sampled datasets. The quality of the results is measured using an NMI (Normalized Mutual Information) type metric to identify the grade of correspondence between the communities generated from the original data and those generated from the sampled data. We find that a representative sampling is possible which preserves the key community structures of an OSN graph, significantly reducing computational cost and also making the resulting graph structure easier to visualize. Finally, comparing the communities generated by each algorithm, we identify the grade of correspondence.