Challenges in web search engines

Authors:
Monika R. Henzinger;Rajeev Motwani;Craig Silverstein
Affiliations:
Google Inc., Mountain View, CA;Department of Computer Science, Stanford University, Stanford, CA;Google Inc., Mountain View, CA
Venue:
IJCAI'03 Proceedings of the 18th international joint conference on Artificial intelligence
Year:
2003

Citing 14
Cited 16

Copy detection mechanisms for digital documents

SIGMOD '95 Proceedings of the 1995 ACM SIGMOD international conference on Management of data
Extracting schema from semistructured data

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
The anatomy of a large-scale hypertextual Web search engine

WWW7 Proceedings of the seventh international conference on World Wide Web 7
Trawling the Web for emerging cyber-communities

WWW '99 Proceedings of the eighth international conference on World Wide Web
Authoritative sources in a hyperlinked environment

Proceedings of the ninth annual ACM-SIAM symposium on Discrete algorithms
Analysis of a very large web search engine query log

ACM SIGIR Forum
Finding replicated Web collections

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
A comparison of techniques to find mirrored hosts on the WWW

Journal of the American Society for Information Science
Integrating the document object model with hyperlinks for enhanced topic distillation and information extraction

Proceedings of the 10th international conference on World Wide Web
Enhanced topic distillation using text, markup tags, and hyperlinks

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Effective site finding using link anchor information

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Attending to web pages

CHI '01 Extended Abstracts on Human Factors in Computing Systems
Challenges in web search engines

ACM SIGIR Forum
On the Resemblance and Containment of Documents

SEQUENCES '97 Proceedings of the Compression and Complexity of Sequences 1997

Discovering large dense subgraphs in massive graphs

VLDB '05 Proceedings of the 31st international conference on Very large data bases
HTML2RSS: automatic generation of RSS feed based on structure analysis of HTML document

Proceedings of the 15th international conference on World Wide Web
HTML2RSS: automatic generation of RSS feed based on structure analysis of HTML document

Proceedings of the 15th international conference on World Wide Web
AggregateRank: bringing order to web sites

SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Improving web spam classification using rank-time features

AIRWeb '07 Proceedings of the 3rd international workshop on Adversarial information retrieval on the web
Relational clustering by symmetric convex coding

Proceedings of the 24th international conference on Machine learning
Identifying web spam with user behavior analysis

AIRWeb '08 Proceedings of the 4th international workshop on Adversarial information retrieval on the web
Robust PageRank and locally computable spam detection features

AIRWeb '08 Proceedings of the 4th international workshop on Adversarial information retrieval on the web
An attentive self-organizing neural model for text mining

Expert Systems with Applications: An International Journal
Link spam target detection using page farms

ACM Transactions on Knowledge Discovery from Data (TKDD)
Web Spam Detection by Exploring Densely Connected Subgraphs

WI-IAT '11 Proceedings of the 2011 IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology - Volume 01
Enhancing duplicate collection detection through replica boundary discovery

PAKDD'06 Proceedings of the 10th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining
Topic-independent web high-quality page selection based on k-means clustering

AIRS'05 Proceedings of the Second Asia conference on Asia Information Retrieval Technology
Identifying Web Spam with the Wisdom of the Crowds

ACM Transactions on the Web (TWEB)
Thwarting the nigritude ultramarine: learning to identify link spam

ECML'05 Proceedings of the 16th European conference on Machine Learning
NCDawareRank: a novel ranking method that exploits the decomposable structure of the web

Proceedings of the sixth ACM international conference on Web search and data mining

Quantified Score

Hi-index	0.00

Visualization

Abstract

This article presents a high-level discussion of some problems that are unique to web search engines. The goal is to raise awareness and stimulate research in these areas.