The Journal of Machine Learning Research
Coarse-to-fine n-best parsing and MaxEnt discriminative reranking
ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
Topical N-Grams: Phrase and Topic Discovery, with an Application to Information Retrieval
ICDM '07 Proceedings of the 2007 Seventh IEEE International Conference on Data Mining
Computational methods in authorship attribution
Journal of the American Society for Information Science and Technology
Source language markers in EUROPARL translations
COLING '08 Proceedings of the 22nd International Conference on Computational Linguistics - Volume 1
CACLA '07 Proceedings of the Workshop on Cognitive Aspects of Computational Language Acquisition
ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
Modeling perspective using adaptor grammars
EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
Reducing grounded learning tasks to grammatical inference
EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Exploiting parse structures for native language identification
EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Automatically determining an anonymous author's native language
ISI'05 Proceedings of the 2005 IEEE international conference on Intelligence and Security Informatics
Productivity and reuse in language
Productivity and reuse in language
Hi-index | 0.00 |
The task of inferring the native language of an author based on texts written in a second language has generally been tackled as a classification problem, typically using as features a mix of n-grams over characters and part of speech tags (for small and fixed n) and unigram function words. To capture arbitrarily long n-grams that syntax-based approaches have suggested are useful, adaptor grammars have some promise. In this work we investigate their extension to identifying n-gram collocations of arbitrary length over a mix of PoS tags and words, using both maxent and induced syntactic language model approaches to classification. After presenting a new, simple baseline, we show that learned collocations used as features in a maxent model perform better still, but that the story is more mixed for the syntactic language model.