The art of computer programming, volume 2 (3rd ed.): seminumerical algorithms
The art of computer programming, volume 2 (3rd ed.): seminumerical algorithms
The anatomy of a large-scale hypertextual Web search engine
WWW7 Proceedings of the seventh international conference on World Wide Web 7
Trawling the Web for emerging cyber-communities
WWW '99 Proceedings of the eighth international conference on World Wide Web
Managing gigabytes (2nd ed.): compressing and indexing documents and images
Managing gigabytes (2nd ed.): compressing and indexing documents and images
Authoritative sources in a hyperlinked environment
Journal of the ACM (JACM)
Proceedings of the 9th international World Wide Web conference on Computer networks : the international journal of computer and telecommunications netowrking
An Efficient Method for Generating Discrete Random Variables with General Distributions
ACM Transactions on Mathematical Software (TOMS)
Proceedings of the 11th international conference on World Wide Web
Heuristics for semi-external depth first search on directed graphs
Proceedings of the fourteenth annual ACM symposium on Parallel algorithms and architectures
Introduction to Algorithms
Proceedings of the 27th International Conference on Very Large Data Bases
Using PageRank to Characterize Web Structure
COCOON '02 Proceedings of the 8th Annual International Conference on Computing and Combinatorics
Handbook of massive data sets
Stochastic models for the Web graph
FOCS '00 Proceedings of the 41st Annual Symposium on Foundations of Computer Science
Towards Compressing Web Graphs
DCC '01 Proceedings of the Data Compression Conference
The webgraph framework I: compression techniques
Proceedings of the 13th international conference on World Wide Web
Simulating the Webgraph: A Comparative Analysis of Models
Computing in Science and Engineering
The indexable web is more than 11.5 billion pages
WWW '05 Special interest tracks and posters of the 14th international conference on World Wide Web
Stanford WebBase components and applications
ACM Transactions on Internet Technology (TOIT)
Characterization of the Thai hostgraph
Proceedings of the 2nd international conference on Ubiquitous information management and communication
Web science: an interdisciplinary approach to understanding the web
Communications of the ACM - Web science
Smart Miner: a new framework for mining large scale web usage data
Proceedings of the 18th international conference on World wide web
Determining factors behind the PageRank log-log plot
WAW'07 Proceedings of the 5th international conference on Algorithms and models for the web-graph
Analyzing a Korean blogosphere: a social network analysis perspective
Proceedings of the 2011 ACM Symposium on Applied Computing
Computing strongly connected components in the streaming model
TAPAS'11 Proceedings of the First international ICST conference on Theory and practice of algorithms in (computer) systems
How matchable are four thousand ontologies on the semantic web
ESWC'11 Proceedings of the 8th extended semantic web conference on The semantic web: research and applications - Volume Part I
Discovering better navigation sequences for the session construction problem
Data & Knowledge Engineering
Object link structure in the semantic web
ESWC'10 Proceedings of the 7th international conference on The Semantic Web: research and Applications - Volume Part II
Constructing a reliable Web graph with information on browsing behavior
Decision Support Systems
A Data-Driven Approach to Measure Web Site Navigability
Journal of Management Information Systems
Relatedness between vocabularies on the Web of data: A taxonomy and an empirical study
Web Semantics: Science, Services and Agents on the World Wide Web
Hi-index | 0.00 |
In this article we present an experimental study of the properties of webgraphs. We study a large crawl from 2001 of 200M pages and about 1.4 billion edges, made available by the WebBase project at Stanford, as well as several synthetic ones generated according to various models proposed recently. We investigate several topological properties of such graphs, including the number of bipartite cores and strongly connected components, the distribution of degrees and PageRank values and some correlations; we present a comparison study of the models against these measures.Our findings are that (i) the WebBase sample differs slightly from the (older) samples studied in the literature, and (ii) despite the fact that these models do not catch all of its properties, they do exhibit some peculiar behaviors not found, for example, in the models from classical random graph theory.Moreover we developed a software library able to generate and measure massive graphs in secondary memory; this library is publicy available under the GPL licence. We discuss its implementation and some computational issues related to secondary memory graph algorithms.