An exhaustive and edge-removal algorithm to find cores in implicit communities

Authors:
Nan Yang;Songxiang Lin;Qiang Gao
Affiliations:
The School of Information, Renmin University of China, Beijing, China;The School of Information, Renmin University of China, Beijing, China;The School of Information, Renmin University of China, Beijing, China
Venue:
APWeb/WAIM'07 Proceedings of the joint 9th Asia-Pacific web and 8th international conference on web-age information management conference on Advances in data and web management
Year:
2007

Citing 8
Cited 0

Inferring Web communities from link topology

Proceedings of the ninth ACM conference on Hypertext and hypermedia : links, objects, time and space---structure in hypermedia systems: links, objects, time and space---structure in hypermedia systems
Syntactic clustering of the Web

Selected papers from the sixth international conference on World Wide Web
Automatic resource compilation by analyzing hyperlink structure and associated text

WWW7 Proceedings of the seventh international conference on World Wide Web 7
Finding related pages in the World Wide Web

WWW '99 Proceedings of the eighth international conference on World Wide Web
Trawling the Web for emerging cyber-communities

WWW '99 Proceedings of the eighth international conference on World Wide Web
Graph structure in the Web

Proceedings of the 9th international World Wide Web conference on Computer networks : the international journal of computer and telecommunications netowrking
Efficient identification of Web communities

Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
Extracting Large-Scale Knowledge Bases from the Web

VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases

Quantified Score

Hi-index	0.01

Visualization

Abstract

Web community is intensely studied in web resource discovery. Many literatures use core as the signature of a community. A core is a complete bipartite graphs, denoted as Ci, j. But discovery of all possible Ci, j in the web is a challenging job. This work has been investigated by trawling [1][2]. Trawling employs repeated elimination/generation procedure until the graph is pruned to a satisfied state and then enumerate all possible Ci, j. We proposed a new method that uses exhaustive and edge removal method. Our algorithm avoids scanning dataset many times. Also, we improve crawling method by only recording potential fans to save disk space. The experiment result show that the new algorithm works properly and many new Ci, j can be found by our method.