Towards advanced collocation error correction in Spanish learner corpora

  • Authors:
  • Gabriela Ferraro;Rogelio Nazar;Margarita Alonso Ramos;Leo Wanner

  • Affiliations:
  • Department of Information and Communication Technologies, Pompeu Fabra University, Barcelona, Spain;Institute for Applied Linguistics, Pompeu Fabra University, Barcelona, Spain;Faculty of Philology, University of La Coruña, La Coruña, Spain;Department of Information and Communication Technologies, Catalan Institute for Research and Advanced Studies (ICREA), Pompeu Fabra University, Barcelona, Spain

  • Venue:
  • Language Resources and Evaluation
  • Year:
  • 2014

Quantified Score

Hi-index 0.00

Visualization

Abstract

Collocations in the sense of idiosyncratic binary lexical co-occurrences are one of the biggest challenges for any language learner. Even advanced learners make collocation mistakes in that they literally translate collocation elements from their native tongue, create new words as collocation elements, choose a wrong subcategorization for one of the elements, etc. Therefore, automatic collocation error detection and correction is increasingly in demand. However, while state-of-the-art models predict, with a reasonable accuracy, whether a given co-occurrence is a valid collocation or not, only few of them manage to suggest appropriate corrections with an acceptable hit rate. Most often, a ranked list of correction options is offered from which the learner has then to choose. This is clearly unsatisfactory. Our proposal focuses on this critical part of the problem in the context of the acquisition of Spanish as second language. For collocation error detection, we use a frequency-based technique. To improve on collocation error correction, we discuss three different metrics with respect to their capability to select the most appropriate correction of miscollocations found in our learner corpus.