The effectiveness of GIOSS for the text database discovery problem
SIGMOD '94 Proceedings of the 1994 ACM SIGMOD international conference on Management of data
HyPursuit: a hierarchical network search engine that exploits content-link hypertext clustering
Proceedings of the the seventh ACM conference on Hypertext
ParaSite: mining structural information on the Web
Selected papers from the sixth international conference on World Wide Web
The quest for correct information on the Web: hyper search engines
Selected papers from the sixth international conference on World Wide Web
Efficient crawling through URL ordering
WWW7 Proceedings of the seventh international conference on World Wide Web 7
Authoritative sources in a hyperlinked environment
Proceedings of the ninth annual ACM-SIAM symposium on Discrete algorithms
Queries and Computation on the Web
ICDT '97 Proceedings of the 6th International Conference on Database Theory
Do HTML Tags Flag Semantic Content?
IEEE Internet Computing
An economic model of the worldwide web
WWW '05 Special interest tracks and posters of the 14th international conference on World Wide Web
Stigmergic Hyperlink: A New Social Web Object
International Journal of Information Systems and Social Change
Ligra: a lightweight graph processing framework for shared memory
Proceedings of the 18th ACM SIGPLAN symposium on Principles and practice of parallel programming
Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval
Serefind: a crowd-powered search engine
Proceedings of the companion publication of the 17th ACM conference on Computer supported cooperative work & social computing
Editorial: A topic-specific crawling strategy based on semantics similarity
Data & Knowledge Engineering
Hi-index | 0.00 |
In this paper, we present Google, a prototype of a large-scale search engine which makes heavy use of the structure present in hypertext. Google is designed to crawl and index the Web efficiently and produce much more satisfying search results than existing systems. The prototype with a full text and hyperlink database of at least 24million pages is available at http://google.stanford.edu/ To engineer a search engine is a challenging task. Search engines index tens to hundreds of millions of web pages involving a comparable number of distinct terms. They answer tens of millions of queries every day. Despite the importance of large-scale search engines on the web, very little academic research has been done on them. Furthermore, due to rapid advance in technology and web proliferation, creating a web search engine today is very different from 3years ago. This paper provides an in-depth description of our large-scale web search engine - the first such detailed public description we know of to date. Apart from the problems of scaling traditional search techniques to data of this magnitude, there are new technical challenges involved with using the additional information present in hypertext to produce better search results. This paper addresses this question of how to build a practical large-scale system which can exploit the additional information present in hypertext. Also we look at the problem of how to effectively deal with uncontrolled hypertext collections, where anyone can publish anything they want.