Using classifier features for studying the effect of native language on the choice of written second language words

Authors:
Oren Tsur;Ari Rappoport
Affiliations:
The Hebrew University, Jerusalem, Israel;The Hebrew University, Jerusalem, Israel
Venue:
CACLA '07 Proceedings of the Workshop on Cognitive Aspects of Computational Language Acquisition
Year:
2007

Citing 5
Cited 4

Term-weighting approaches in automatic text retrieval

Information Processing and Management: an International Journal
Authorship Attribution with Support Vector Machines

Applied Intelligence
Determining an author's native language by mining a text for errors

Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)

Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)
Automatically determining an anonymous author's native language

ISI'05 Proceedings of the 2005 IEEE international conference on Intelligence and Security Informatics

Exploiting parse structures for native language identification

EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Using psycholinguistic features for profiling first language of authors

Journal of the American Society for Information Science and Technology
Stylometric analysis of scientific articles

NAACL HLT '12 Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
Exploring adaptor grammars for native language identification

EMNLP-CoNLL '12 Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning

Quantified Score

Hi-index	0.00

Visualization

Abstract

We apply machine learning techniques to study language transfer, a major topic in the theory of Second Language Acquisition (SLA). Using an SVM for the problem of native language classification, we show that a careful analysis of the effects of various features can lead to scientific insights. In particular, we demonstrate that character bigrams alone allow classification levels of about 66% for a 5-class task, even when content and function word differences are accounted for. This may show that native language has a strong effect on the word choice of people writing in a second language.