Identifying aggregates in hypertext structures
HYPERTEXT '91 Proceedings of the third annual ACM conference on Hypertext
Silk from a sow's ear: extracting usable structures from the Web
Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
Syntactic clustering of the Web
Selected papers from the sixth international conference on World Wide Web
The space complexity of approximating the frequency moments
Journal of Computer and System Sciences
Trawling the Web for emerging cyber-communities
WWW '99 Proceedings of the eighth international conference on World Wide Web
External memory algorithms
Proceedings of the 9th international World Wide Web conference on Computer networks : the international journal of computer and telecommunications netowrking
Efficient identification of Web communities
Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
Min-wise independent permutations
Journal of Computer and System Sciences - 30th annual ACM symposium on theory of computing
A comparison of techniques to find mirrored hosts on the WWW
Journal of the American Society for Information Science
Mining the Web: Discovering Knowledge from HyperText Data
Mining the Web: Discovering Knowledge from HyperText Data
Extracting Large-Scale Knowledge Bases from the Web
VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
Fast Algorithms for Mining Association Rules in Large Databases
VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Massive Quasi-Clique Detection
LATIN '02 Proceedings of the 5th Latin American Symposium on Theoretical Informatics
Who Links to Whom: Mining Linkage between Web Sites
ICDM '01 Proceedings of the 2001 IEEE International Conference on Data Mining
SemTag and seeker: bootstrapping the semantic web via automated semantic annotation
WWW '03 Proceedings of the 12th international conference on World Wide Web
On the bursty evolution of blogspace
WWW '03 Proceedings of the 12th international conference on World Wide Web
The site browser: catalyzing improvements in hypertext organization
Proceedings of the fifteenth ACM conference on Hypertext and hypermedia
Proceedings of the 13th international World Wide Web conference on Alternate track papers & posters
A graph-theoretic approach to extract storylines from search results
Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
How to build a WebFountain: An architecture for very large-scale text analytics
IBM Systems Journal
Spam, damn spam, and statistics: using statistical analysis to locate spam web pages
Proceedings of the 7th International Workshop on the Web and Databases: colocated with ACM SIGMOD/PODS 2004
On the Streaming Model Augmented with a Sorting Primitive
FOCS '04 Proceedings of the 45th Annual IEEE Symposium on Foundations of Computer Science
Challenges in web search engines
IJCAI'03 Proceedings of the 18th international joint conference on Artificial intelligence
Link spam detection based on mass estimation
VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Detectives: detecting coalition hit inflation attacks in advertising networks streams
Proceedings of the 16th international conference on World Wide Web
Extraction and classification of dense communities in the web
Proceedings of the 16th international conference on World Wide Web
Link analysis for Web spam detection
ACM Transactions on the Web (TWEB)
A scalable pattern mining approach to web graph compression with communities
WSDM '08 Proceedings of the 2008 International Conference on Web Search and Data Mining
Connectivity structure of bipartite graphs via the KNC-plot
WSDM '08 Proceedings of the 2008 International Conference on Web Search and Data Mining
Efficient mining of frequent XML query patterns with repeating-siblings
Information and Software Technology
A local algorithm for finding dense subgraphs
Proceedings of the nineteenth annual ACM-SIAM symposium on Discrete algorithms
Efficient similarity joins for near duplicate detection
Proceedings of the 17th international conference on World Wide Web
Graph summarization with bounded error
Proceedings of the 2008 ACM SIGMOD international conference on Management of data
SkyGraph: an algorithm for important subgraph discovery in relational graphs
Data Mining and Knowledge Discovery
Efficient semi-streaming algorithms for local triangle counting in massive graphs
Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
An efficient parallel approach for identifying protein families in large-scale metagenomic data sets
Proceedings of the 2008 ACM/IEEE conference on Supercomputing
Effective Pruning Techniques for Mining Quasi-Cliques
ECML PKDD '08 Proceedings of the European conference on Machine Learning and Knowledge Discovery in Databases - Part II
Automatic detection of cohesive subgroups within social hypertext: A heuristic approach
The New Review of Hypermedia and Multimedia
Less is more: sampling the neighborhood graph makes SALSA better and faster
Proceedings of the Second ACM International Conference on Web Search and Data Mining
Finding Dense Subgraphs with Size Bounds
WAW '09 Proceedings of the 6th International Workshop on Algorithms and Models for the Web-Graph
Extraction and classification of dense implicit communities in the Web graph
ACM Transactions on the Web (TWEB)
GADDI: distance index based subgraph matching in biological networks
Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology
A Uniform Framework for Ad-Hoc Indexes to Answer Reachability Queries on Large Graphs
DASFAA '09 Proceedings of the 14th International Conference on Database Systems for Advanced Applications
A Uniform Framework for Ad-Hoc Indexes to Answer Reachability Queries on Large Graphs
DASFAA '09 Proceedings of the 14th International Conference on Database Systems for Advanced Applications
On compressing social networks
Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
ICALP '09 Proceedings of the 36th International Colloquium on Automata, Languages and Programming: Part I
Graph OLAP: a multi-dimensional framework for graph data analysis
Knowledge and Information Systems
GConnect: a connectivity index for massive disk-resident graphs
Proceedings of the VLDB Endowment
Framework for evaluating clustering algorithms in duplicate detection
Proceedings of the VLDB Endowment
An efficient algorithm for enumerating pseudo cliques
ISAAC'07 Proceedings of the 18th international conference on Algorithms and computation
A local algorithm for finding dense subgraphs
ACM Transactions on Algorithms (TALG)
Pruthak: mining and analyzing graph substructures
Proceedings of the Eighth Workshop on Mining and Learning with Graphs
The community-search problem and how to plan a successful cocktail party
Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining
Efficient algorithms for large-scale local triangle counting
ACM Transactions on Knowledge Discovery from Data (TKDD)
DESSIN: mining dense subgraph patterns in a single graph
SSDBM'10 Proceedings of the 22nd international conference on Scientific and statistical database management
C&C: an effective algorithm for extracting web community cores
DASFAA'10 Proceedings of the 15th international conference on Database systems for advanced applications
The impact of unlinkability on adversarial community detection: effects and countermeasures
PETS'10 Proceedings of the 10th international conference on Privacy enhancing technologies
Detecting hot events from web search logs
WAIM'10 Proceedings of the 11th international conference on Web-age information management
On dense pattern mining in graph streams
Proceedings of the VLDB Endowment
On triangulation-based dense neighborhood graph discovery
Proceedings of the VLDB Endowment
Fixing the threshold for effective detection of near duplicate web documents in web crawling
ADMA'10 Proceedings of the 6th international conference on Advanced data mining and applications: Part I
Foundations and Trends in Information Retrieval
On fast enumeration of pseudo bicliques
IWOCA'10 Proceedings of the 21st international conference on Combinatorial algorithms
Local graph sparsification for scalable clustering
Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
Graph cube: on warehousing and OLAP multidimensional networks
Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
Assessing and ranking structural correlations in graphs
Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
Efficient topological OLAP on information networks
DASFAA'11 Proceedings of the 16th international conference on Database systems for advanced applications - Volume Part I
Efficient similarity joins for near-duplicate detection
ACM Transactions on Database Systems (TODS)
On sampling type distribution from heterogeneous social networks
PAKDD'11 Proceedings of the 15th Pacific-Asia conference on Advances in knowledge discovery and data mining - Volume Part II
An OpenMP algorithm and implementation for clustering biological graphs
Proceedings of the first workshop on Irregular applications: architectures and algorithm
Optimizing K2 trees: A case for validating the maturity of network of practices
Computers & Mathematics with Applications
On clustering heterogeneous social media objects with outlier links
Proceedings of the fifth ACM international conference on Web search and data mining
Mining diversity on social media networks
Multimedia Tools and Applications
Discovering burst areas in fast evolving graphs
DASFAA'10 Proceedings of the 15th international conference on Database Systems for Advanced Applications - Volume Part I
Densest subgraph in streaming and MapReduce
Proceedings of the VLDB Endowment
Mining attribute-structure correlated patterns in large attributed graphs
Proceedings of the VLDB Endowment
Dense subgraph maintenance under streaming edge weight updates for real-time story identification
Proceedings of the VLDB Endowment
Topic mining based on graph local clustering
MICAI'11 Proceedings of the 10th international conference on Artificial Intelligence: advances in Soft Computing - Volume Part II
Framework and algorithms for network bucket testing
Proceedings of the 21st international conference on World Wide Web
User community reconstruction using sampled microblogging data
Proceedings of the 21st international conference companion on World Wide Web
Community detection in Social Media
Data Mining and Knowledge Discovery
Dense Neighborhoods on Affinity Graph
International Journal of Computer Vision
Discovery of top-k dense subgraphs in dynamic graph collections
SSDBM'12 Proceedings of the 24th international conference on Scientific and Statistical Database Management
ciForager: Incrementally discovering regions of correlated change in evolving graphs
ACM Transactions on Knowledge Discovery from Data (TKDD)
Proceedings of the 21st ACM international conference on Information and knowledge management
Dense subgraphs on dynamic networks
DISC'12 Proceedings of the 26th international conference on Distributed Computing
Online search of overlapping communities
Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data
Scalable all-pairs similarity search in metric spaces
Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining
Denser than the densest subgraph: extracting optimal quasi-cliques with quality guarantees
Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining
Proceedings of the 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining
Finding contexts of social influence in online social networks
Proceedings of the 7th Workshop on Social Network Mining and Analysis
Truncated power method for sparse eigenvalue problems
The Journal of Machine Learning Research
Campaign extraction from social media
ACM Transactions on Intelligent Systems and Technology (TIST) - Special Section on Intelligent Mobile Knowledge Discovery and Management Systems and Special Issue on Social Web Mining
Proceedings of the VLDB Endowment
Scalable community detection in massive social networks using MapReduce
IBM Journal of Research and Development
Hi-index | 0.00 |
We present a new algorithm for finding large, dense subgraphs in massive graphs. Our algorithm is based on a recursive application of fingerprinting via shingles, and is extremely efficient, capable of handling graphs with tens of billions of edges on a single machine with modest resources.We apply our algorithm to characterize the large, dense subgraphs of a graph showing connections between hosts on the World Wide Web; this graph contains over 50M hosts and 11B edges, gathered from 2.1B web pages. We measure the distribution of these dense subgraphs and their evolution over time. We show that more than half of these hosts participate in some dense subgraph found by the analysis. There are several hundred giant dense subgraphs of at least ten thousand hosts; two thousand dense subgraphs at least a thousand hosts; and almost 64K dense subgraphs of at least a hundred hosts.Upon examination, many of the dense subgraphs output by our algorithm are link spam, i.e., websites that attempt to manipulate search engine rankings through aggressive interlinking to simulate popular content. We therefore propose dense subgraph extraction as a useful primitive for spam detection, and discuss its incorporation into the workflow of web search engines.