Effects of spam removal on search engine efficiency and effectiveness

  • Authors:
  • Matt Crane;Andrew Trotman

  • Affiliations:
  • University of Otago, Dunedin, New Zealand;University of Otago, Dunedin, New Zealand

  • Venue:
  • Proceedings of the Seventeenth Australasian Document Computing Symposium
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

Spam has long been identified as a problem that web search engines are required to deal with. Large collection sizes are also an increasing issue for institutions that do not have the necessary resources to process them in their entirety. In this paper we investigate the effect that withholding documents identified as spam has on the resources required to process large collections. We also investigate the resulting search effectiveness and efficiency when different amounts of spam are withheld. We find that by removing spam at indexing time we are able to decrease the index size without affecting the indexing throughput, and are able to improve search precision for some thresholds.