Extraction and classification of dense communities in the web
Proceedings of the 16th international conference on World Wide Web
Extraction and classification of dense implicit communities in the Web graph
ACM Transactions on the Web (TWEB)
Extracting Research Communities by Improved Maximum Flow Algorithm
KES '09 Proceedings of the 13th International Conference on Knowledge-Based and Intelligent Information and Engineering Systems: Part II
Boosting concept discovery in collective intelligences
BI'09 Proceedings of the 2009 international conference on Brain informatics
An improved algorithm for extracting research communities from bibliographic data
DASFAA'10 Proceedings of the 15th international conference on Database systems for advanced applications
Detection of web communities from community cores
WISS'10 Proceedings of the 2010 international conference on Web information systems engineering
A topology-driven approach to the design of web meta-search clustering engines
SOFSEM'05 Proceedings of the 31st international conference on Theory and Practice of Computer Science
On clustering techniques for change diagnosis in data streams
WebKDD'05 Proceedings of the 7th international conference on Knowledge Discovery on the Web: advances in Web Mining and Web Usage Analysis
Hi-index | 0.00 |
In this paper, we propose an edge capacity based on huband authority scores, and examine the effects of using theedge capacity on the method for extracting web communities using maximum flow algorithm proposed by G.Flake etal. A web community is a collection of web pages in which acommon (or related) topic is taken up. In recent years, various methods for finding web communities have been proposed. G.Flake et al.'s method, which is based on maximumflow algorithm, has a big advantages: "topic drift" doesnot easily occur. On the other hand, it sets the edge capacity to a fixed value for every edge, which is one of the majorcause of failing to obtain a proper web community. Ourapproach, which is using HITS score based edge capacity, effectively extracts web pages retaining well-balancedin both global and local relations to the given seed node.We examined the effects by the experiments for randomlyselected 20 topics using web archives in Japan crawled in2002. The result confirmed that the average precision roseapproximately 20%.