Annotating 200 million words: the Bank of English project

  • Authors:
  • Timo Järvinen

  • Affiliations:
  • University of Hesinki

  • Venue:
  • COLING '94 Proceedings of the 15th conference on Computational linguistics - Volume 1
  • Year:
  • 1994

Quantified Score

Hi-index 0.00

Visualization

Abstract

The Bank of English is an international English language project sponsored by Harper-Collins Publishers, Glasgow, and conducted by the COBUILD team at the University of Birmingham, UK. The text bank will comprise some 200 million words of both written and spoken English. The whole 200 million word corpus is being annotated morphologically and syntactically during 1993--94 at the Research Unit for Computational Linguistics (RUCL), University of Helsinki, using the English morphological analyser (ENGTWOL) and English Constraint Grammar (ENGCG) parser. The first half of the texts (103 million words) has already been processed in 1993. The project is lead by Prof. John Sinclair in Birmingham, and Prof. Fred Karlsson in Helsinki. The present author is responsible for conducting the annotation.