Reprint of: The anatomy of a large-scale hypertextual web search engine

Authors:
Sergey Brin;Lawrence Page
Affiliations:
Computer Science Department, Stanford University, Stanford, CA 94305, USA;Computer Science Department, Stanford University, Stanford, CA 94305, USA
Venue:
Computer Networks: The International Journal of Computer and Telecommunications Networking
Year:
2012

Citing 7
Cited 8

The effectiveness of GIOSS for the text database discovery problem

SIGMOD '94 Proceedings of the 1994 ACM SIGMOD international conference on Management of data
HyPursuit: a hierarchical network search engine that exploits content-link hypertext clustering

Proceedings of the the seventh ACM conference on Hypertext
ParaSite: mining structural information on the Web

Selected papers from the sixth international conference on World Wide Web
The quest for correct information on the Web: hyper search engines

Selected papers from the sixth international conference on World Wide Web
Efficient crawling through URL ordering

WWW7 Proceedings of the seventh international conference on World Wide Web 7
Authoritative sources in a hyperlinked environment

Proceedings of the ninth annual ACM-SIAM symposium on Discrete algorithms
Queries and Computation on the Web

ICDT '97 Proceedings of the 6th International Conference on Database Theory

Do HTML Tags Flag Semantic Content?

IEEE Internet Computing
An economic model of the worldwide web

WWW '05 Special interest tracks and posters of the 14th international conference on World Wide Web
Stigmergic Hyperlink: A New Social Web Object

International Journal of Information Systems and Social Change
Ligra: a lightweight graph processing framework for shared memory

Proceedings of the 18th ACM SIGPLAN symposium on Principles and practice of parallel programming
Document identifier reassignment and run-length-compressed inverted indexes for improved search performance

Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval
The Democratic Contribution of Weakly Tied Political Networks: Moderate Political Blogs as Bridges to Heterogeneous Information Pools

Social Science Computer Review
Serefind: a crowd-powered search engine

Proceedings of the companion publication of the 17th ACM conference on Computer supported cooperative work & social computing
Editorial: A topic-specific crawling strategy based on semantics similarity

Data & Knowledge Engineering

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper, we present Google, a prototype of a large-scale search engine which makes heavy use of the structure present in hypertext. Google is designed to crawl and index the Web efficiently and produce much more satisfying search results than existing systems. The prototype with a full text and hyperlink database of at least 24million pages is available at http://google.stanford.edu/ To engineer a search engine is a challenging task. Search engines index tens to hundreds of millions of web pages involving a comparable number of distinct terms. They answer tens of millions of queries every day. Despite the importance of large-scale search engines on the web, very little academic research has been done on them. Furthermore, due to rapid advance in technology and web proliferation, creating a web search engine today is very different from 3years ago. This paper provides an in-depth description of our large-scale web search engine - the first such detailed public description we know of to date. Apart from the problems of scaling traditional search techniques to data of this magnitude, there are new technical challenges involved with using the additional information present in hypertext to produce better search results. This paper addresses this question of how to build a practical large-scale system which can exploit the additional information present in hypertext. Also we look at the problem of how to effectively deal with uncontrolled hypertext collections, where anyone can publish anything they want.