Google books n-gram corpus used as a grammar checker

  • Authors:
  • Rogelio Nazar;Irene Renau

  • Affiliations:
  • University Institute of Applied Linguistics, Universitat Pompeu Fabra, Barcelona, Spain;University Institute of Applied Linguistics, Universitat Pompeu Fabra, Barcelona, Spain

  • Venue:
  • EACL 2012 Proceedings of the Second Workshop on Computational Linguistics and Writing (CLW 2012): Linguistic and Cognitive Aspects of Document Creation and Document Engineering
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this research we explore the possibility of using a large n-gram corpus (Google Books) to derive lexical transition probabilities from the frequency of word n-grams and then use them to check and suggest corrections in a target text without the need for grammar rules. We conduct several experiments in Spanish, although our conclusions also reach other languages since the procedure is corpus-driven. The paper reports on experiments involving different types of grammar errors, which are conducted to test different grammar-checking procedures, namely, spotting possible errors, deciding between different lexical possibilities and filling-in the blanks in a text.