A reference collection for web spam

  • Authors:
  • Carlos Castillo;Debora Donato;Luca Becchetti;Paolo Boldi;Stefano Leonardi;Massimo Santini;Sebastiano Vigna

  • Affiliations:
  • Università di Roma, Rome, Italy and Yahoo! Research, Barcelona, Catalunya, Spain;Università di Roma, Rome, Italy and Yahoo! Research, Barcelona, Catalunya, Spain;Università di Roma, Rome, Italy;Università degli Studi, Milan, Italy;Università di Roma, Rome, Italy;Università degli Studi, Milan, Italy;Università degli Studi, Milan, Italy

  • Venue:
  • ACM SIGIR Forum
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

We describe the WEBSPAM-UK2006 collection, a large set of Web pages that have been manually annotated with labels indicating if the hosts are include Web spam aspects or not. This is the first publicly available Web spam collection that includes page contents and links, and that has been labelled by a large and diverse set of judges.