Learning to predict code-switching points

Authors:
Thamar Solorio;Yang Liu
Affiliations:
The University of Texas at Dallas, Richardson, TX;The University of Texas at Dallas, Richardson, TX
Venue:
EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Year:
2008

Citing 5
Cited 1

Data mining: practical machine learning tools and techniques with Java implementations

Data mining: practical machine learning tools and techniques with Java implementations
Classification by Voting Feature Intervals

ECML '97 Proceedings of the 9th European Conference on Machine Learning
The production of code-mixed discourse

COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 1
Processing of sentences with intra-sentential code-switching

COLING '82 Proceedings of the 9th conference on Computational linguistics - Volume 1
Part-of-speech tagging for English-Spanish code-switched text

EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing

An investigation of code-switching attitude dependent language modeling

SLSP'13 Proceedings of the First international conference on Statistical Language and Speech Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Predicting possible code-switching points can help develop more accurate methods for automatically processing mixed-language text, such as multilingual language models for speech recognition systems and syntactic analyzers. We present in this paper exploratory results on learning to predict potential code-switching points in Spanish-English. We trained different learning algorithms using a transcription of code-switched discourse. To evaluate the performance of the classifiers, we used two different criteria: 1) measuring precision, recall, and F-measure of the predictions against the reference in the transcription, and 2) rating the naturalness of artificially generated code-switched sentences. Average scores for the code-switched sentences generated by our machine learning approach were close to the scores of those generated by humans.