Linguistic profiling of texts for the purpose of language verification

  • Authors:
  • Hans Van Halteren;Nelleke Oostdijk

  • Affiliations:
  • Univ. of Nijmegen, The Netherlands;Univ. of Nijmegen, The Netherlands

  • Venue:
  • COLING '04 Proceedings of the 20th international conference on Computational Linguistics
  • Year:
  • 2004

Quantified Score

Hi-index 0.00

Visualization

Abstract

In order to control the quality of internet-based language corpora, we developed a method to verify automatically that texts are of (near-) native quality. For the LOCNESS and ICLE corpora, the method is rather successful in separating native and non-native learner texts. The Equal Error Rate is about 10%. However, for other domains, such as internet texts, separate classifiers have to be trained on the basis of suitable seed corpora.