Extracting local web communities using lexical similarity

  • Authors:
  • Xianchao Zhang;Wen Xu;Wenxin Liang

  • Affiliations:
  • School of Software, Dalian University of Technology, China;School of Software, Dalian University of Technology, China;School of Software, Dalian University of Technology, China

  • Venue:
  • DASFAA'10 Proceedings of the 15th international conference on Database systems for advanced applications
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

The World Wide Web contains rich textual contents that are interconnected via complex hyperlinks. Most studies on web community extraction only focus on graph structures. Consequently, web communities are discovered purely in terms of explicit link information without considering textual properties of web pages. This paper proposes an improved algorithm based on Flake's method using the maximum flow algorithm. The improved algorithm considers the differences between edges in terms of importance, and assigns a well-designed capacity to each edge via the lexical similarity of web pages. Given a specific query, it also lends itself to a new and efficient ranking scheme for members in the extracted community. The experimental results indicate that our approach efficiently handles a variety of data sets across a novel optimization strategy of similarity computation.