Revealing Hidden Community Structures and Identifying Bridges in Complex Networks: An Application to Analyzing Contents of Web Pages for Browsing

  • Authors:
  • Faraz Zaidi;Arnaud Sallaberry;Guy Melancon

  • Affiliations:
  • -;-;-

  • Venue:
  • WI-IAT '09 Proceedings of the 2009 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology - Volume 01
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

The emergence of scale free and small world properties in real world complex networks has stimulated lots of activity in the field of network analysis. An example of such a network comes from the field of Content Analysis (CA) and Text Mining where the goal is to analyze the contents of a set of web pages. The Network can be represented by the words appearing in the web pages as nodes and the edges representing a relation between two words if they appear in a document together. In this paper we present a CA system that helps users analyze these networks representing the textual contents of a set of web pages visually. Major contributions include a methodology to cluster complex networks based on duplication of nodes and identification of bridges i.e. words that might be of user interest but have a low frequency in the document corpus. We have tested this system with a number of data sets and users have found it very useful for the exploration of data. One of the case studies is presented in detail which is based on browsing a collection of web pages on Wikipedia.