Exploring adaptor grammars for native language identification

  • Authors:
  • Sze-Meng Jojo Wong;Mark Dras;Mark Johnson

  • Affiliations:
  • Macquarie University, Sydney, NSW, Australia;Macquarie University, Sydney, NSW, Australia;Macquarie University, Sydney, NSW, Australia

  • Venue:
  • EMNLP-CoNLL '12 Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

The task of inferring the native language of an author based on texts written in a second language has generally been tackled as a classification problem, typically using as features a mix of n-grams over characters and part of speech tags (for small and fixed n) and unigram function words. To capture arbitrarily long n-grams that syntax-based approaches have suggested are useful, adaptor grammars have some promise. In this work we investigate their extension to identifying n-gram collocations of arbitrary length over a mix of PoS tags and words, using both maxent and induced syntactic language model approaches to classification. After presenting a new, simple baseline, we show that learned collocations used as features in a maxent model perform better still, but that the story is more mixed for the syntactic language model.