The web as a graph: measurements, models, and methods

Authors:
Jon M. Kleinberg;Ravi Kumar;Prabhakar Raghavan;Sridhar Rajagopalan;Andrew S. Tomkins
Affiliations:
Department of Computer Science, Cornell University, Ithaca, NY;IBM Almaden Research Center, San Jose, CA;IBM Almaden Research Center, San Jose, CA;IBM Almaden Research Center, San Jose, CA;IBM Almaden Research Center, San Jose, CA
Venue:
COCOON'99 Proceedings of the 5th annual international conference on Computing and combinatorics
Year:
1999

Citing 15
Cited 176

Finding Regular Simple Paths in Graph Databases

SIAM Journal on Computing
Query flocks: a generalization of association-rule mining

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Enhanced hypertext categorization using hyperlinks

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Information gathering in the World-Wide Web: the W3QL query language and the W3QS system

ACM Transactions on Database Systems (TODS)
ParaSite: mining structural information on the Web

Selected papers from the sixth international conference on World Wide Web
WebQuery: searching and visualizing the Web through connectivity

Selected papers from the sixth international conference on World Wide Web
Applications of a Web query language

Selected papers from the sixth international conference on World Wide Web
Database techniques for the World-Wide Web: a survey

ACM SIGMOD Record
Improved algorithms for topic distillation in a hyperlinked environment

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Automatic resource compilation by analyzing hyperlink structure and associated text

WWW7 Proceedings of the seventh international conference on World Wide Web 7
The anatomy of a large-scale hypertextual Web search engine

WWW7 Proceedings of the seventh international conference on World Wide Web 7
A technique for measuring the relative size and overlap of public Web search engines

WWW7 Proceedings of the seventh international conference on World Wide Web 7
Trawling the Web for emerging cyber-communities

WWW '99 Proceedings of the eighth international conference on World Wide Web
Authoritative sources in a hyperlinked environment

Journal of the ACM (JACM)
Fast Algorithms for Mining Association Rules in Large Databases

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases

A random graph model for massive graphs

STOC '00 Proceedings of the thirty-second annual ACM symposium on Theory of computing
SALSA: the stochastic approach for link-structure analysis

ACM Transactions on Information Systems (TOIS)
A survey of Web metrics

ACM Computing Surveys (CSUR)
Network topology generators: degree-based vs. structural

Proceedings of the 2002 conference on Applications, technologies, architectures, and protocols for computer communications
An Improved Computation of the PageRank Algorithm

Proceedings of the 24th BCS-IRSG European Colloquium on IR Research: Advances in Information Retrieval
Graph-Theoretic Web Algorithms: An Overview

IICS '01 Proceedings of the International Workshop on Innovative Internet Computing Systems
A Decentral Library for Scientific Articles

IICS '02 Proceedings of the Second International Workshop on Innovative Internet Computing Systems
Extracting Large-Scale Knowledge Bases from the Web

VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
Average-Clicks: A New Measure of Distance on the World Wide Web

WI '01 Proceedings of the First Asia-Pacific Conference on Web Intelligence: Research and Development
Efficient and Simple Encodings for the Web Graph

WAIM '02 Proceedings of the Third International Conference on Advances in Web-Age Information Management
Efficient Routing in Networks with Long Range Contacts

DISC '01 Proceedings of the 15th International Conference on Distributed Computing
Strategies for Hotlink Assignments

ISAAC '00 Proceedings of the 11th International Conference on Algorithms and Computation
Enhancing Information Retrieval in Federated Bibliographic Data Sources Using Author Network Based Stratagems

ECDL '01 Proceedings of the 5th European Conference on Research and Advanced Technology for Digital Libraries
Using PageRank to Characterize Web Structure

COCOON '02 Proceedings of the 8th Annual International Conference on Computing and Combinatorics
Greedy approximation algorithms for finding dense components in a graph

APPROX '00 Proceedings of the Third International Workshop on Approximation Algorithms for Combinatorial Optimization
Extraction Positive and Negative Keywords for Web Communities

DS '00 Proceedings of the Third International Conference on Discovery Science
Discovery of Web Communities Based on the Co-Occurence of References

DS '00 Proceedings of the Third International Conference on Discovery Science
A Method for Discovering Purified Web Communities

DS '01 Proceedings of the 4th International Conference on Discovery Science
Web Information Retrieval - an Algorithmic Perspective

ESA '00 Proceedings of the 8th Annual European Symposium on Algorithms
Deriving and Verifying Statistical Distribution of a Hyperlink-Based Web Page Quality Metric

DEXA '02 Proceedings of the 13th International Conference on Database and Expert Systems Applications
Web mining: creating structure out of chaos

Managing data mining technologies in organizations
Random evolution in massive graphs

Handbook of massive data sets
An analysis of Internet content delivery systems

ACM SIGOPS Operating Systems Review - OSDI '02: Proceedings of the 5th symposium on Operating systems design and implementation
Using mobile agents for network resource discovery in peer-to-peer networks

ACM SIGecom Exchanges
Implicit link analysis for small web search

Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Towards Compressing Web Graphs

DCC '01 Proceedings of the Data Compression Conference
The connectivity sonar: detecting site functionality by structural patterns

Proceedings of the fourteenth ACM conference on Hypertext and hypermedia
Simulation in Web data management

Applied system simulation
Impact of search engines on page popularity

Proceedings of the 13th international conference on World Wide Web
Building domain-specific web collections for scientific digital libraries: a meta-search enhanced focused crawling method

Proceedings of the 4th ACM/IEEE-CS joint conference on Digital libraries
Average-clicks: a new measure of distance on the World Wide Web

Journal of Intelligent Information Systems - Special issue on web intelligence
On the temporal dimension of search

Proceedings of the 13th international World Wide Web conference on Alternate track papers & posters
Recommender Systems Research: A Connection-Centric Survey

Journal of Intelligent Information Systems
The Evolution of Link-Attributes for Pages and Its Implications on Web Crawling

WI '04 Proceedings of the 2004 IEEE/WIC/ACM International Conference on Web Intelligence
Simulating the Webgraph: A Comparative Analysis of Models

Computing in Science and Engineering
Synthesizing Realistic Computational Grids

Proceedings of the 2003 ACM/IEEE conference on Supercomputing
Rank-Stability and Rank-Similarity of Link-Based Web Ranking Algorithms in Authority-Connected Graphs

Information Retrieval
A new perspective to automatically rank scientific conferences using digital libraries

Information Processing and Management: an International Journal
An analysis of internet content delivery systems

OSDI '02 Proceedings of the 5th symposium on Operating systems design and implementationCopyright restrictions prevent ACM from being able to make the PDFs for this conference available for downloading
On the bias of traceroute sampling: or, power-law degree distributions in regular graphs

Proceedings of the thirty-seventh annual ACM symposium on Theory of computing
Toward a basic framework for webometrics

Journal of the American Society for Information Science and Technology - Special issue: Webometrics
Web-crawling reliability

Journal of the American Society for Information Science and Technology - Special issue: Webometrics
Isomorphism and embedding problems for infinite limits of scale-free graphs

SODA '05 Proceedings of the sixteenth annual ACM-SIAM symposium on Discrete algorithms
The influence of search engines on preferential attachment

SODA '05 Proceedings of the sixteenth annual ACM-SIAM symposium on Discrete algorithms
The bin-covering technique for thresholding random geometric graph properties

SODA '05 Proceedings of the sixteenth annual ACM-SIAM symposium on Discrete algorithms
Measuring search engine bias

Information Processing and Management: an International Journal
Graphs over time: densification laws, shrinking diameters and possible explanations

Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
Spectral and meta-heuristic algorithms for software clustering

Journal of Systems and Software - Special issue: Software reverse engineering
Finding Frequent Patterns in a Large Sparse Graph*

Data Mining and Knowledge Discovery
A citation-based system to assist prize awarding

ACM SIGMOD Record
Recent Research Provides New Picture of Router-Level Internet

Computing in Science and Engineering
Sampling algorithms for pure network topologies: a study on the stability and the separability of metric embeddings

ACM SIGKDD Explorations Newsletter
Efficient PageRank approximation via graph aggregation

Information Retrieval
Graph mining: Laws, generators, and algorithms

ACM Computing Surveys (CSUR)
Preface

Theoretical Computer Science - Complex networks
Evolution of page popularity under random web graph models

Proceedings of the twenty-fifth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
The complexity of detecting fixed-density clusters

Discrete Applied Mathematics
The structure of the world wide web graph: a goldmine of opportunities for undergraduate research

Journal of Computing Sciences in Colleges
Graph evolution: Densification and shrinking diameters

ACM Transactions on Knowledge Discovery from Data (TKDD)
A methodology for the evaluation of web graph models and a test case

Proceedings of the 38th conference on Winter simulation
Generalized comparison of graph-based ranking algorithms for publications and authors

Journal of Systems and Software
Random Surfer with Back Step

Fundamenta Informaticae
Adversarial Deletion in a Scale-Free Random Graph Process

Combinatorics, Probability and Computing
On the peninsula phenomenon in web graph and its implications on web search

Computer Networks: The International Journal of Computer and Telecommunications Networking
The degree distribution of the generalized duplication model

Theoretical Computer Science
Characterization of national Web domains

ACM Transactions on Internet Technology (TOIT)
Transductive link spam detection

AIRWeb '07 Proceedings of the 3rd international workshop on Adversarial information retrieval on the web
First to market is not everything: an analysis of preferential attachment with fitness

Proceedings of the thirty-ninth annual ACM symposium on Theory of computing
The phase transition in inhomogeneous random graphs

Random Structures & Algorithms
SCAN: a structural clustering algorithm for networks

Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
Quasi-random graphs with given degree sequences

Random Structures & Algorithms
Discovering and Visualizing Network Communities

WI-IATW '07 Proceedings of the 2007 IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology - Workshops
Modelling and simulation of the web graph: evaluating an exponential growth copying model

International Journal of Web Engineering and Technology
Identifying the subject of small, sparsely linked collections from a web community

International Journal of Web Based Communities
Clustering techniques utilized in web usage mining

AIKED'06 Proceedings of the 5th WSEAS International Conference on Artificial Intelligence, Knowledge Engineering and Data Bases
Web science: an interdisciplinary approach to understanding the web

Communications of the ACM - Web science
Guanxi in the chinese web - a study of mutual linking

Proceedings of the 17th international conference on World Wide Web
Formal Verification of Websites

Electronic Notes in Theoretical Computer Science (ENTCS)
Densification arising from sampling fixed graphs

SIGMETRICS '08 Proceedings of the 2008 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
The very small world of the well-connected

Proceedings of the nineteenth ACM conference on Hypertext and hypermedia
Discovering correlated spatio-temporal changes in evolving graphs

Knowledge and Information Systems
Web Usage Mining Via Fuzzy Logic Techniques

IFSA '07 Proceedings of the 12th international Fuzzy Systems Association world congress on Foundations of Fuzzy Logic and Soft Computing
Exploring Content and Linkage Structures for Searching Relevant Web Pages

ADMA '07 Proceedings of the 3rd international conference on Advanced Data Mining and Applications
Web Structure Mining by Isolated Stars

Algorithms and Models for the Web-Graph
The very small world of the well-connected

ACM SIGWEB Newsletter
Web Structure Mining by Isolated Cliques

IEICE - Transactions on Information and Systems
Using Local Popularity of Web Resources for Geo-Ranking of Search Engine Results

World Wide Web
On the bias of traceroute sampling: Or, power-law degree distributions in regular graphs

Journal of the ACM (JACM)
Discovering clues for review quality from author's behaviors on e-commerce sites

Proceedings of the 11th International Conference on Electronic Commerce
Bootstrapping a hop-optimal network in the weak sensor model

ACM Transactions on Algorithms (TALG)
RTG: a recursive realistic graph generator using random typing

Data Mining and Knowledge Discovery
Information Extraction as Link Prediction: Using Curated Citation Networks to Improve Gene Detection

WASA '09 Proceedings of the 4th International Conference on Wireless Algorithms, Systems, and Applications
k2-Trees for Compact Web Graph Representation

SPIRE '09 Proceedings of the 16th International Symposium on String Processing and Information Retrieval
RTG: A Recursive Realistic Graph Generator Using Random Typing

ECML PKDD '09 Proceedings of the European Conference on Machine Learning and Knowledge Discovery in Databases: Part I
Web page clustering using heuristic search in the web graph

IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
Efficient processing of group-oriented connection queries in a large graph

Proceedings of the 18th ACM conference on Information and knowledge management
A brief survey of computational approaches in social computing

IJCNN'09 Proceedings of the 2009 international joint conference on Neural Networks
Measuring search engine bias

Information Processing and Management: an International Journal
A Survey of Statistical Network Models

Foundations and Trends® in Machine Learning
Design and evaluation of improvement method on the web information navigation - A stochastic search approach

Decision Support Systems
Kronecker Graphs: An Approach to Modeling Networks

The Journal of Machine Learning Research
The complexity of detecting fixed-density clusters

CIAC'03 Proceedings of the 5th Italian conference on Algorithms and complexity
Improved duplication models for proteome network evolution

RECOMB'05 Proceedings of the 2005 joint annual satellite conference on Systems biology and regulatory genomics
Unobservable surfing on the world wide web: is private information retrieval an alternative to the MIX based approach?

PET'02 Proceedings of the 2nd international conference on Privacy enhancing technologies
Data mining using links in open hypermedia

MIS'02 Proceedings of the 2002 international conference on Metainformatics
Analysis of log files applying mining techniques and fuzzy logic

IEA/AIE'07 Proceedings of the 20th international conference on Industrial, engineering, and other applications of applied intelligent systems
A geometric preferential attachment model of networks II

WAW'07 Proceedings of the 5th international conference on Algorithms and models for the web-graph
Giant component and connectivity in geographical threshold graphs

WAW'07 Proceedings of the 5th international conference on Algorithms and models for the web-graph
Incorporating usage information into average-clicks algorithm

WebKDD'06 Proceedings of the 8th Knowledge discovery on the web international conference on Advances in web mining and web usage analysis
Spectral analysis of dynamically evolving networks with linear preferential attachment

Allerton'09 Proceedings of the 47th annual Allerton conference on Communication, control, and computing
Mining knowledge from databases: an information network analysis approach

Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Co-citations as citation endorsements and co-links as link endorsements

Journal of Information Science
Thresholding random geometric graph properties motivated by ad hoc sensor networks

Journal of Computer and System Sciences
On the relationship between trading network and WWW network: a preferential attachment perspective

International Journal of Business Intelligence and Data Mining
Fast and Compact Web Graph Representations

ACM Transactions on the Web (TWEB)
Reconstructing social interactions using an unreliable wireless sensor network

Computer Communications
Evolution of the mashup ecosystem by copying

Proceedings of the 3rd and 4th International Workshop on Web APIs and Services Mashups
Fast random graph generation

Proceedings of the 14th International Conference on Extending Database Technology
Scalable Uniform Graph Sampling by Local Computation

SIAM Journal on Scientific Computing
Analyzing a Korean blogosphere: a social network analysis perspective

Proceedings of the 2011 ACM Symposium on Applied Computing
A solution to the exact match on rare item searches: introducing the lost sheep algorithm

Proceedings of the International Conference on Web Intelligence, Mining and Semantics
Efficient topological OLAP on information networks

DASFAA'11 Proceedings of the 16th international conference on Database systems for advanced applications - Volume Part I
Compressed string dictionaries

SEA'11 Proceedings of the 10th international conference on Experimental algorithms
It's who you know: graph mining using recursive structural features

Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
Discovering shakers from evolving entities via cascading graph inference

Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
Contracted webgraphs: structure mining and scale-freeness

FAW-AAIM'11 Proceedings of the 5th joint international frontiers in algorithmics, and 7th international conference on Algorithmic aspects in information and management
Graph clustering based on optimization of a macroscopic structure of clusters

DS'11 Proceedings of the 14th international conference on Discovery science
Comparing linkage graph and activity graph of online social networks

SocInfo'11 Proceedings of the Third international conference on Social informatics
Social Awareness and User Modeling to Improve Objects Intelligence

WI-IAT '11 Proceedings of the 2011 IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology - Volume 03
Practical representations for web and social graphs

Proceedings of the 20th ACM international conference on Information and knowledge management
Understanding website complexity: measurements, metrics, and implications

Proceedings of the 2011 ACM SIGCOMM conference on Internet measurement conference
China web graph measurements and evolution

APWeb'05 Proceedings of the 7th Asia-Pacific web conference on Web Technologies Research and Development
Semantic-Based matching and personalization in FWEB, a publish/subscribe-based web infrastructure

OTM'05 Proceedings of the 2005 Confederated international conference on On the Move to Meaningful Internet Systems - Volume >Part I
Realistic, mathematically tractable graph generation and evolution, using kronecker multiplication

PKDD'05 Proceedings of the 9th European conference on Principles and Practice of Knowledge Discovery in Databases
Mining temporally changing web usage graphs

WebKDD'04 Proceedings of the 6th international conference on Knowledge Discovery on the Web: advances in Web Mining and Web Usage Analysis
Web page retrieval by combining evidence

CLEF'05 Proceedings of the 6th international conference on Cross-Language Evalution Forum: accessing Multilingual Information Repositories
Building content clusters based on modelling page pairs

APWeb'06 Proceedings of the 8th Asia-Pacific Web conference on Frontiers of WWW Research and Development
Correlating financial time series with micro-blogging activity

Proceedings of the fifth ACM international conference on Web search and data mining
Dynamics of citation networks

ICANN'06 Proceedings of the 16th international conference on Artificial Neural Networks - Volume Part I
Extraction of structural information from the web

FSKD'05 Proceedings of the Second international conference on Fuzzy Systems and Knowledge Discovery - Volume Part II
Toward the eigenvalue power law

MFCS'06 Proceedings of the 31st international conference on Mathematical Foundations of Computer Science
On randomized broadcasting in power law networks

DISC'06 Proceedings of the 20th international conference on Distributed Computing
Modelling human intelligence: a learning mechanism

MICAI'05 Proceedings of the 4th Mexican international conference on Advances in Artificial Intelligence
The compass filter: search engine result personalization using web communities

ITWP'03 Proceedings of the 2003 international conference on Intelligent Techniques for Web Personalization
A distributed algorithm to find hamiltonian cycles in g(np) random graphs

CAAN'04 Proceedings of the First international conference on Combinatorial and Algorithmic Aspects of Networking
A survey of models of the web graph

CAAN'04 Proceedings of the First international conference on Combinatorial and Algorithmic Aspects of Networking
Extended compact web graph representations

Algorithms and Applications
Ranking web news via homepage visual layout and cross-site voting

ECIR'06 Proceedings of the 28th European conference on Advances in Information Retrieval
Scale invariant bipartite graph generative model

SIIS'11 Proceedings of the 2011 international conference on Security and Intelligent Information Systems
Modeling the flow and change of information on the web

Proceedings of the 21st international conference companion on World Wide Web
A fast algorithm to find all high degree vertices in power law graphs

Proceedings of the 21st international conference companion on World Wide Web
Topological pattern selection in recurrent networks

Neural Networks
BRS-compactness in networks: Theoretical considerations related to cohesion in citation graphs, collaboration networks and the internet

Mathematical and Computer Modelling: An International Journal
Web document clustering using hyperlink structures

Computational Statistics & Data Analysis
Cyberlearners and learning resources

Proceedings of the 2nd International Conference on Learning Analytics and Knowledge
Whom to ask?: jury selection for decision making tasks on micro-blog services

Proceedings of the VLDB Endowment
Structuring political documents for importance ranking

NLDB'12 Proceedings of the 17th international conference on Applications of Natural Language Processing and Information Systems
Random Surfer with Back Step

Fundamenta Informaticae
Analytical modeling of wireless ad hoc networks: degree distribution and maximum clique size

Proceedings of the 1st ACM workshop on High performance mobile opportunistic systems
Degree relations of triangles in real-world networks and graph models

Proceedings of the 21st ACM international conference on Information and knowledge management
Evolution of social-attribute networks: measurements, modeling, and implications using google+

Proceedings of the 2012 ACM conference on Internet measurement conference
The single pixel GPS: learning big data signals from tiny coresets

Proceedings of the 20th International Conference on Advances in Geographic Information Systems
An effective and efficient parallel approach for random graph generation over GPUs

Journal of Parallel and Distributed Computing
Finding the Mule in the Network

ASONAM '12 Proceedings of the 2012 International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2012)
Studying the clustering paradox and scalability of search in highly distributed environments

ACM Transactions on Information Systems (TOIS)
Toward the Design of a Recommender System: Visual Clustering and Detecting Community Structure in a Web Usage Network

WI-IAT '12 Proceedings of the The 2012 IEEE/WIC/ACM International Joint Conferences on Web Intelligence and Intelligent Agent Technology - Volume 01
Toward a next generation of network models for the web

Proceedings of the 5th Annual ACM Web Science Conference
Graph based techniques for tag cloud generation

Proceedings of the 24th ACM Conference on Hypertext and Social Media
Network topology models for multihop wireless networks

ISRN Communications and Networking
The role of information diffusion in the evolution of social networks

Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining
A GPU-based method for computing eigenvector centrality of gene-expression networks

AusPDC '13 Proceedings of the Eleventh Australasian Symposium on Parallel and Distributed Computing - Volume 140
Current challenges in web crawling

ICWE'13 Proceedings of the 13th international conference on Web Engineering
Identification of structural landmarks in a park using movement data collected in a location-based game

Proceedings of The First ACM SIGSPATIAL International Workshop on Computational Models of Place
Horton+: a distributed system for processing declarative reachability queries over partitioned graphs

Proceedings of the VLDB Endowment
Compact representation of Web graphs with extended functionality

Information Systems
Approximability of the vertex cover problem in power-law graphs

Theoretical Computer Science

Quantified Score

Hi-index	0.00

Visualization

Abstract

The pages and hyperlinks of the World-Wide Web may be viewed as nodes and edges in a directed graph. This graph is a fascinating object of study: it has several hundred million nodes today, over a billion links, and appears to grow exponentially with time. There are many reasons -- mathematical, sociological, and commercial -- for studying the evolution of this graph. In this paper we begin by describing two algorithms that operate on the Web graph, addressing problems from Web search and automatic community discovery. We then report a number of measurements and properties of this graph that manifested themselves as we ran these algorithms on the Web. Finally, we observe that traditional random graph models do not explain these observations, and we propose a new family of random graph models. These models point to a rich new sub-field of the study of random graphs, and raise questions about the analysis of graph algorithms on the Web.