Referral Web: combining social networks and collaborative filtering
Communications of the ACM
Inferring Web communities from link topology
Proceedings of the ninth ACM conference on Hypertext and hypermedia : links, objects, time and space---structure in hypermedia systems: links, objects, time and space---structure in hypermedia systems
Syntactic clustering of the Web
Selected papers from the sixth international conference on World Wide Web
Trawling the Web for emerging cyber-communities
WWW '99 Proceedings of the eighth international conference on World Wide Web
Proceedings of the 9th international World Wide Web conference on Computer networks : the international journal of computer and telecommunications netowrking
The stochastic approach for link-structure analysis (SALSA) and the TKC effect
Proceedings of the 9th international World Wide Web conference on Computer networks : the international journal of computer and telecommunications netowrking
Efficient identification of Web communities
Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
Min-wise independent permutations
Journal of Computer and System Sciences - 30th annual ACM symposium on theory of computing
A comparison of techniques to find mirrored hosts on the WWW
Journal of the American Society for Information Science
Approximation algorithms for maximization problems arising in graph partitioning
Journal of Algorithms
Relations between average case complexity and approximation complexity
STOC '02 Proceedings of the thiry-fourth annual ACM symposium on Theory of computing
Mining the Web's Link Structure
Computer
Extracting Large-Scale Knowledge Bases from the Web
VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
Massive Quasi-Clique Detection
LATIN '02 Proceedings of the 5th Latin American Symposium on Theoretical Informatics
Finding a Web Community by Maximum Flow Algorithm with HITS Score Based Capacity
DASFAA '03 Proceedings of the Eighth International Conference on Database Systems for Advanced Applications
An Approach to Relate the Web Communities through Bipartite Graphs
WISE '01 Proceedings of the Second International Conference on Web Information Systems Engineering (WISE'01) Volume 1 - Volume 1
The webgraph framework I: compression techniques
Proceedings of the 13th international conference on World Wide Web
UbiCrawler: a scalable fully distributed web crawler
Software—Practice & Experience
ACM Transactions on Internet Technology (TOIT)
Partitioning of Web graphs by community topology
WWW '05 Proceedings of the 14th international conference on World Wide Web
Identifying link farm spam pages
WWW '05 Special interest tracks and posters of the 14th international conference on World Wide Web
The indexable web is more than 11.5 billion pages
WWW '05 Special interest tracks and posters of the 14th international conference on World Wide Web
Discovering large dense subgraphs in massive graphs
VLDB '05 Proceedings of the 31st international conference on Very large data bases
A scalable algorithm for high-quality clustering of web snippets
Proceedings of the 2006 ACM symposium on Applied computing
Cluster generation and cluster labelling for web snippets: a fast and accurate hierarchical solution
SPIRE'06 Proceedings of the 13th international conference on String Processing and Information Retrieval
Discovering and Visualizing Network Communities
WI-IATW '07 Proceedings of the 2007 IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology - Workshops
A scalable pattern mining approach to web graph compression with communities
WSDM '08 Proceedings of the 2008 International Conference on Web Search and Data Mining
Connectivity structure of bipartite graphs via the KNC-plot
WSDM '08 Proceedings of the 2008 International Conference on Web Search and Data Mining
Scalable community discovery on textual data with relations
Proceedings of the 17th ACM conference on Information and knowledge management
Finding Dense Subgraphs with Size Bounds
WAW '09 Proceedings of the 6th International Workshop on Algorithms and Models for the Web-Graph
Extraction and classification of dense implicit communities in the Web graph
ACM Transactions on the Web (TWEB)
Proceedings of the eleventh international joint conference on Measurement and modeling of computer systems
Social influence analysis in large-scale networks
Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
ICALP '09 Proceedings of the 36th International Colloquium on Automata, Languages and Programming: Part I
The community-search problem and how to plan a successful cocktail party
Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining
Discovering Knowledge-Sharing Communities in Question-Answering Forums
ACM Transactions on Knowledge Discovery from Data (TKDD)
Mining topic-level influence in heterogeneous networks
CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
C&C: an effective algorithm for extracting web community cores
DASFAA'10 Proceedings of the 15th international conference on Database systems for advanced applications
Discovery and analysis of tightly knit communities in telecom social networks
IBM Journal of Research and Development
Extracting local community structure from local cores
DASFAA'11 Proceedings of the 16th international conference on Database systems for advanced applications
Social based layouts for the increase of locality in graph operations
DASFAA'11 Proceedings of the 16th international conference on Database systems for advanced applications - Volume Part I
Contracted webgraphs: structure mining and scale-freeness
FAW-AAIM'11 Proceedings of the 5th joint international frontiers in algorithmics, and 7th international conference on Algorithmic aspects in information and management
Detection of web communities from community cores
WISS'10 Proceedings of the 2010 international conference on Web information systems engineering
Discovering burst areas in fast evolving graphs
DASFAA'10 Proceedings of the 15th international conference on Database Systems for Advanced Applications - Volume Part I
Densest subgraph in streaming and MapReduce
Proceedings of the VLDB Endowment
A vertex similarity probability model for finding network community structure
PAKDD'12 Proceedings of the 16th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining - Volume Part I
Temporal semantic centrality for the analysis of communication networks
ICWE'12 Proceedings of the 12th international conference on Web Engineering
Like-Minded communities: bringing the familiarity and similarity together
WISE'12 Proceedings of the 13th international conference on Web Information Systems Engineering
Extract and rank web communities
Proceedings of the 3rd International Conference on Web Intelligence, Mining and Semantics
Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Streaming algorithms for k-core decomposition
Proceedings of the VLDB Endowment
A unified community detection algorithm in complex network
Neurocomputing
Exploiting small world property for network clustering
World Wide Web
Hi-index | 0.00 |
The World Wide Web (WWW) is rapidly becoming important for society as a medium for sharing data, information and services, and there is a growing interest in tools for understanding collective behaviors and emerging phenomena in the WWW. In this paper we focus on the problem of searching and classifying communities in the web. Loosely speaking a community is a group of pages related to a common interest. More formally communities have been associated in the computer science literature with the existence of a locally dense sub-graph of the web-graph (where web pages are nodes and hyper-links are arcs of the web-graph). The core of our contribution is a new scalable algorithm for finding relatively dense subgraphs in massive graphs. We apply our algorithm on web-graphs built on three publicly available large crawls of the web (with raw sizes up to 120M nodes and 1G arcs). The effectiveness of our algorithm in finding dense subgraphs is demonstrated experimentally by embedding artificial communities in the web-graph and counting how many of these are blindly found. Effectiveness increases with the size and density of the communities: it is close to 100% for communities of a thirty nodes or more (even at low density). It is still about 80% even for communities of twenty nodes with density over 50% of the arcs present. At the lower extremes the algorithm catches 35% of dense communities made of ten nodes. We complete our Community Watch system by clustering the communities found in the web-graph into homogeneous groups by topic and labelling each group by representative keywords.