Annotated web as corpus

  • Authors:
  • Paul Rayson;James Walkerdine;William H. Fletcher;Adam Kilgarriff

  • Affiliations:
  • Lancaster University, UK;Lancaster University, UK;United States Naval Academy;Lexical Computing Ltd., UK

  • Venue:
  • WAC '06 Proceedings of the 2nd International Workshop on Web as Corpus
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper presents a proposal to facilitate the use of the annotated web as corpus by alleviating the annotation bottleneck for corpus data drawn from the web. We describe a framework for large-scale distributed corpus annotation using peer-to-peer (P2P) technology to meet this need. We also propose to annotate a large reference corpus in order to evaluate this framework. This will allow us to investigate the affordances offered by distributed techniques to ensure replicability of linguistic research based on web-derived corpora.