Extracting local web communities using lexical similarity

Authors:
Xianchao Zhang;Wen Xu;Wenxin Liang
Affiliations:
School of Software, Dalian University of Technology, China;School of Software, Dalian University of Technology, China;School of Software, Dalian University of Technology, China
Venue:
DASFAA'10 Proceedings of the 15th international conference on Database systems for advanced applications
Year:
2010

Citing 10
Cited 0

Using WordNet to disambiguate word senses for text retrieval

SIGIR '93 Proceedings of the 16th annual international ACM SIGIR conference on Research and development in information retrieval
Efficient identification of Web communities

Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
Self-Organization and Identification of Web Communities

Computer
Relationship-based clustering and cluster ensembles for high-dimensional data mining

Relationship-based clustering and cluster ensembles for high-dimensional data mining
Communities from seed sets

Proceedings of the 15th international conference on World Wide Web
Building implicit links from content for forum search

SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Graph-based text classification: learn from your neighbors

SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Mining Communities on the Web Using a Max-Flow and a Site-Oriented Framework

IEICE - Transactions on Information and Systems
Building structured web community portals: a top-down, compositional, and incremental approach

VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Extracting and ranking viral communities using seeds and content similarity

Proceedings of the nineteenth ACM conference on Hypertext and hypermedia

Quantified Score

Hi-index	0.00

Visualization

Abstract

The World Wide Web contains rich textual contents that are interconnected via complex hyperlinks. Most studies on web community extraction only focus on graph structures. Consequently, web communities are discovered purely in terms of explicit link information without considering textual properties of web pages. This paper proposes an improved algorithm based on Flake's method using the maximum flow algorithm. The improved algorithm considers the differences between edges in terms of importance, and assigns a well-designed capacity to each edge via the lexical similarity of web pages. Given a specific query, it also lends itself to a new and efficient ranking scheme for members in the extracted community. The experimental results indicate that our approach efficiently handles a variety of data sets across a novel optimization strategy of similarity computation.