Effects of spam removal on search engine efficiency and effectiveness

Authors:
Matt Crane;Andrew Trotman
Affiliations:
University of Otago, Dunedin, New Zealand;University of Otago, Dunedin, New Zealand
Venue:
Proceedings of the Seventeenth Australasian Document Computing Symposium
Year:
2012

Citing 4
Cited 3

A study of smoothing methods for language models applied to information retrieval

ACM Transactions on Information Systems (TOIS)
Novelty and diversity in information retrieval evaluation

Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Expected reciprocal rank for graded relevance

Proceedings of the 18th ACM conference on Information and knowledge management
Efficient and effective spam filtering and re-ranking for large web datasets

Information Retrieval

Diversified relevance feedback

Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval
The seventeenth australasian document computing symposium

ACM SIGIR Forum
Malformed UTF-8 and spam

Proceedings of the 18th Australasian Document Computing Symposium

Quantified Score

Hi-index	0.00

Visualization

Abstract

Spam has long been identified as a problem that web search engines are required to deal with. Large collection sizes are also an increasing issue for institutions that do not have the necessary resources to process them in their entirety. In this paper we investigate the effect that withholding documents identified as spam has on the resources required to process large collections. We also investigate the resulting search effectiveness and efficiency when different amounts of spam are withheld. We find that by removing spam at indexing time we are able to decrease the index size without affecting the indexing throughput, and are able to improve search precision for some thresholds.