Finding a Web Community by Maximum Flow Algorithm with HITS Score Based Capacity

Authors:
Noriko Imafuji;Masaru Kitsuregawa
Affiliations:
-;-
Venue:
DASFAA '03 Proceedings of the Eighth International Conference on Database Systems for Advanced Applications
Year:
2003

Citing 0
Cited 9

Extraction and classification of dense communities in the web

Proceedings of the 16th international conference on World Wide Web
Extraction and classification of dense implicit communities in the Web graph

ACM Transactions on the Web (TWEB)
Extracting Research Communities by Improved Maximum Flow Algorithm

KES '09 Proceedings of the 13th International Conference on Knowledge-Based and Intelligent Information and Engineering Systems: Part II
Boosting concept discovery in collective intelligences

BI'09 Proceedings of the 2009 international conference on Brain informatics
An improved algorithm for extracting research communities from bibliographic data

DASFAA'10 Proceedings of the 15th international conference on Database systems for advanced applications
Detection of web communities from community cores

WISS'10 Proceedings of the 2010 international conference on Web information systems engineering
A topology-driven approach to the design of web meta-search clustering engines

SOFSEM'05 Proceedings of the 31st international conference on Theory and Practice of Computer Science
On clustering techniques for change diagnosis in data streams

WebKDD'05 Proceedings of the 7th international conference on Knowledge Discovery on the Web: advances in Web Mining and Web Usage Analysis
An exploration of link-based knowledge map in academic web space

Scientometrics

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper, we propose an edge capacity based on huband authority scores, and examine the effects of using theedge capacity on the method for extracting web communities using maximum flow algorithm proposed by G.Flake etal. A web community is a collection of web pages in which acommon (or related) topic is taken up. In recent years, various methods for finding web communities have been proposed. G.Flake et al.'s method, which is based on maximumflow algorithm, has a big advantages: "topic drift" doesnot easily occur. On the other hand, it sets the edge capacity to a fixed value for every edge, which is one of the majorcause of failing to obtain a proper web community. Ourapproach, which is using HITS score based edge capacity, effectively extracts web pages retaining well-balancedin both global and local relations to the given seed node.We examined the effects by the experiments for randomlyselected 20 topics using web archives in Japan crawled in2002. The result confirmed that the average precision roseapproximately 20%.