Towards advanced collocation error correction in Spanish learner corpora

Authors:
Gabriela Ferraro;Rogelio Nazar;Margarita Alonso Ramos;Leo Wanner
Affiliations:
Department of Information and Communication Technologies, Pompeu Fabra University, Barcelona, Spain;Institute for Applied Linguistics, Pompeu Fabra University, Barcelona, Spain;Faculty of Philology, University of La Coruña, La Coruña, Spain;Department of Information and Communication Technologies, Catalan Institute for Research and Advanced Studies (ICREA), Pompeu Fabra University, Barcelona, Spain
Venue:
Language Resources and Evaluation
Year:
2014

Citing 14
Cited 0

Automated postediting of documents

AAAI '94 Proceedings of the twelfth national conference on Artificial intelligence (vol. 1)
EuroWordNet: a multilingual database with lexical semantic networks

EuroWordNet: a multilingual database with lexical semantic networks
Retrieving collocations from text: Xtract

Computational Linguistics - Special issue on using large corpora: I
Word-for-word glossing with contextually similar words

NAACL 2000 Proceedings of the 1st North American chapter of the Association for Computational Linguistics conference
How to detect grammatical errors in a text without parsing it

EACL '87 Proceedings of the third conference on European chapter of the Association for Computational Linguistics
Word association norms, mutual information, and lexicography

ACL '89 Proceedings of the 27th annual meeting on Association for Computational Linguistics
Experiments on candidate data for collocation extraction

EACL '03 Proceedings of the tenth conference on European chapter of the Association for Computational Linguistics - Volume 2
Accurate unlexicalized parsing

ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1
Is the sky pure today? AwkChecker: an assistive tool for detecting and correcting collocation errors

Proceedings of the 21st annual ACM symposium on User interface software and technology
Automated suggestions for miscollocations

EdAppsNLP '09 Proceedings of the Fourth Workshop on Innovative Use of NLP for Building Educational Applications
Collocation extraction beyond the independence assumption

ACLShort '10 Proceedings of the ACL 2010 Conference Short Papers
Automatic collocation suggestion in academic writing

ACLShort '10 Proceedings of the ACL 2010 Conference Short Papers
StringNet as a computational resource for discovering and investigating linguistic constructions

EUCCL '10 Proceedings of the NAACL HLT Workshop on Extracting and Using Constructions in Computational Linguistics
Correcting semantic collocation errors with L1-induced paraphrases

EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Collocations in the sense of idiosyncratic binary lexical co-occurrences are one of the biggest challenges for any language learner. Even advanced learners make collocation mistakes in that they literally translate collocation elements from their native tongue, create new words as collocation elements, choose a wrong subcategorization for one of the elements, etc. Therefore, automatic collocation error detection and correction is increasingly in demand. However, while state-of-the-art models predict, with a reasonable accuracy, whether a given co-occurrence is a valid collocation or not, only few of them manage to suggest appropriate corrections with an acceptable hit rate. Most often, a ranked list of correction options is offered from which the learner has then to choose. This is clearly unsatisfactory. Our proposal focuses on this critical part of the problem in the context of the acquisition of Spanish as second language. For collocation error detection, we use a frequency-based technique. To improve on collocation error correction, we discuss three different metrics with respect to their capability to select the most appropriate correction of miscollocations found in our learner corpus.