Exploring adaptor grammars for native language identification

Authors:
Sze-Meng Jojo Wong;Mark Dras;Mark Johnson
Affiliations:
Macquarie University, Sydney, NSW, Australia;Macquarie University, Sydney, NSW, Australia;Macquarie University, Sydney, NSW, Australia
Venue:
EMNLP-CoNLL '12 Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning
Year:
2012

Citing 12
Cited 0

Latent dirichlet allocation

The Journal of Machine Learning Research
Coarse-to-fine n-best parsing and MaxEnt discriminative reranking

ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
Topical N-Grams: Phrase and Topic Discovery, with an Application to Information Retrieval

ICDM '07 Proceedings of the 2007 Seventh IEEE International Conference on Data Mining
Computational methods in authorship attribution

Journal of the American Society for Information Science and Technology
Source language markers in EUROPARL translations

COLING '08 Proceedings of the 22nd International Conference on Computational Linguistics - Volume 1
Using classifier features for studying the effect of native language on the choice of written second language words

CACLA '07 Proceedings of the Workshop on Cognitive Aspects of Computational Language Acquisition
PCFGs, topic models, adaptor grammars and learning topical collocations and the structure of proper names

ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
Modeling perspective using adaptor grammars

EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
Reducing grounded learning tasks to grammatical inference

EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Exploiting parse structures for native language identification

EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Automatically determining an anonymous author's native language

ISI'05 Proceedings of the 2005 IEEE international conference on Intelligence and Security Informatics
Productivity and reuse in language

Productivity and reuse in language

Quantified Score

Hi-index	0.00

Visualization

Abstract

The task of inferring the native language of an author based on texts written in a second language has generally been tackled as a classification problem, typically using as features a mix of n-grams over characters and part of speech tags (for small and fixed n) and unigram function words. To capture arbitrarily long n-grams that syntax-based approaches have suggested are useful, adaptor grammars have some promise. In this work we investigate their extension to identifying n-gram collocations of arbitrary length over a mix of PoS tags and words, using both maxent and induced syntactic language model approaches to classification. After presenting a new, simple baseline, we show that learned collocations used as features in a maxent model perform better still, but that the story is more mixed for the syntactic language model.