Elements of information theory
Elements of information theory
Referral Web: combining social networks and collaborative filtering
Communications of the ACM
Inferring Web communities from link topology
Proceedings of the ninth ACM conference on Hypertext and hypermedia : links, objects, time and space---structure in hypermedia systems: links, objects, time and space---structure in hypermedia systems
Syntactic clustering of the Web
Selected papers from the sixth international conference on World Wide Web
Trawling the Web for emerging cyber-communities
WWW '99 Proceedings of the eighth international conference on World Wide Web
Proceedings of the 9th international World Wide Web conference on Computer networks : the international journal of computer and telecommunications netowrking
The stochastic approach for link-structure analysis (SALSA) and the TKC effect
Proceedings of the 9th international World Wide Web conference on Computer networks : the international journal of computer and telecommunications netowrking
Efficient identification of Web communities
Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
Min-wise independent permutations
Journal of Computer and System Sciences - 30th annual ACM symposium on theory of computing
A comparison of techniques to find mirrored hosts on the WWW
Journal of the American Society for Information Science
Approximation algorithms for maximization problems arising in graph partitioning
Journal of Algorithms
Relations between average case complexity and approximation complexity
STOC '02 Proceedings of the thiry-fourth annual ACM symposium on Theory of computing
Using web structure for classifying and describing web pages
Proceedings of the 11th international conference on World Wide Web
Mining the Web's Link Structure
Computer
Extracting Large-Scale Knowledge Bases from the Web
VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
Massive Quasi-Clique Detection
LATIN '02 Proceedings of the 5th Latin American Symposium on Theoretical Informatics
Finding a Web Community by Maximum Flow Algorithm with HITS Score Based Capacity
DASFAA '03 Proceedings of the Eighth International Conference on Database Systems for Advanced Applications
An Approach to Relate the Web Communities through Bipartite Graphs
WISE '01 Proceedings of the Second International Conference on Web Information Systems Engineering (WISE'01) Volume 1 - Volume 1
The connectivity sonar: detecting site functionality by structural patterns
Proceedings of the fourteenth ACM conference on Hypertext and hypermedia
The webgraph framework I: compression techniques
Proceedings of the 13th international conference on World Wide Web
UbiCrawler: a scalable fully distributed web crawler
Software—Practice & Experience
ACM Transactions on Internet Technology (TOIT)
Partitioning of Web graphs by community topology
WWW '05 Proceedings of the 14th international conference on World Wide Web
Identifying link farm spam pages
WWW '05 Special interest tracks and posters of the 14th international conference on World Wide Web
The indexable web is more than 11.5 billion pages
WWW '05 Special interest tracks and posters of the 14th international conference on World Wide Web
Discovering large dense subgraphs in massive graphs
VLDB '05 Proceedings of the 31st international conference on Very large data bases
A Voting Method for the Classification of Web Pages
WI-IATW '06 Proceedings of the 2006 IEEE/WIC/ACM international conference on Web Intelligence and Intelligent Agent Technology
Extraction and classification of dense communities in the web
Proceedings of the 16th international conference on World Wide Web
Proceedings of the 16th international conference on World Wide Web
Classifying web data in directory structures
APWeb'06 Proceedings of the 8th Asia-Pacific Web conference on Frontiers of WWW Research and Development
Proceedings of the 19th international conference on World wide web
Clustering web pages to facilitate revisitation on mobile devices
Proceedings of the 2012 ACM international conference on Intelligent User Interfaces
AutoWeb: automatic classification of mobile web pages for revisitation
MobileHCI '12 Proceedings of the 14th international conference on Human-computer interaction with mobile devices and services
An approach for using Wikipedia to measure the flow of trends across countries
Proceedings of the 22nd international conference on World Wide Web companion
Dense subgraph mining with a mixed graph model
Pattern Recognition Letters
Hi-index | 0.00 |
The World Wide Web (WWW) is rapidly becoming important for society as a medium for sharing data, information, and services, and there is a growing interest in tools for understanding collective behavior and emerging phenomena in the WWW. In this article we focus on the problem of searching and classifying communities in the Web. Loosely speaking a community is a group of pages related to a common interest. More formally, communities have been associated in the computer science literature with the existence of a locally dense subgraph of the Web graph (where Web pages are nodes and hyperlinks are arcs of the Web graph). The core of our contribution is a new scalable algorithm for finding relatively dense subgraphs in massive graphs. We apply our algorithm on Web graphs built on three publicly available large crawls of the Web (with raw sizes up to 120M nodes and 1G arcs). The effectiveness of our algorithm in finding dense subgraphs is demonstrated experimentally by embedding artificial communities in the Web graph and counting how many of these are blindly found. Effectiveness increases with the size and density of the communities: it is close to 100% for communities of thirty nodes or more (even at low density). It is still about 80% even for communities of twenty nodes with density over 50% of the arcs present. At the lower extremes the algorithm catches 35% of dense communities made of ten nodes. We also develop some sufficient conditions for the detection of a community under some local graph models and not-too-restrictive hypotheses. We complete our Community Watch system by clustering the communities found in the Web graph into homogeneous groups by topic and labeling each group by representative keywords.