Copy detection mechanisms for digital documents
SIGMOD '95 Proceedings of the 1995 ACM SIGMOD international conference on Management of data
Extracting schema from semistructured data
SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
The anatomy of a large-scale hypertextual Web search engine
WWW7 Proceedings of the seventh international conference on World Wide Web 7
Trawling the Web for emerging cyber-communities
WWW '99 Proceedings of the eighth international conference on World Wide Web
Authoritative sources in a hyperlinked environment
Proceedings of the ninth annual ACM-SIAM symposium on Discrete algorithms
Analysis of a very large web search engine query log
ACM SIGIR Forum
Finding replicated Web collections
SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
A comparison of techniques to find mirrored hosts on the WWW
Journal of the American Society for Information Science
Proceedings of the 10th international conference on World Wide Web
Enhanced topic distillation using text, markup tags, and hyperlinks
Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Effective site finding using link anchor information
Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
CHI '01 Extended Abstracts on Human Factors in Computing Systems
Challenges in web search engines
ACM SIGIR Forum
On the Resemblance and Containment of Documents
SEQUENCES '97 Proceedings of the Compression and Complexity of Sequences 1997
Discovering large dense subgraphs in massive graphs
VLDB '05 Proceedings of the 31st international conference on Very large data bases
HTML2RSS: automatic generation of RSS feed based on structure analysis of HTML document
Proceedings of the 15th international conference on World Wide Web
HTML2RSS: automatic generation of RSS feed based on structure analysis of HTML document
Proceedings of the 15th international conference on World Wide Web
AggregateRank: bringing order to web sites
SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Improving web spam classification using rank-time features
AIRWeb '07 Proceedings of the 3rd international workshop on Adversarial information retrieval on the web
Relational clustering by symmetric convex coding
Proceedings of the 24th international conference on Machine learning
Identifying web spam with user behavior analysis
AIRWeb '08 Proceedings of the 4th international workshop on Adversarial information retrieval on the web
Robust PageRank and locally computable spam detection features
AIRWeb '08 Proceedings of the 4th international workshop on Adversarial information retrieval on the web
An attentive self-organizing neural model for text mining
Expert Systems with Applications: An International Journal
Link spam target detection using page farms
ACM Transactions on Knowledge Discovery from Data (TKDD)
Web Spam Detection by Exploring Densely Connected Subgraphs
WI-IAT '11 Proceedings of the 2011 IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology - Volume 01
Enhancing duplicate collection detection through replica boundary discovery
PAKDD'06 Proceedings of the 10th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining
Topic-independent web high-quality page selection based on k-means clustering
AIRS'05 Proceedings of the Second Asia conference on Asia Information Retrieval Technology
Identifying Web Spam with the Wisdom of the Crowds
ACM Transactions on the Web (TWEB)
Thwarting the nigritude ultramarine: learning to identify link spam
ECML'05 Proceedings of the 16th European conference on Machine Learning
NCDawareRank: a novel ranking method that exploits the decomposable structure of the web
Proceedings of the sixth ACM international conference on Web search and data mining
Hi-index | 0.00 |
This article presents a high-level discussion of some problems that are unique to web search engines. The goal is to raise awareness and stimulate research in these areas.