Unsupervised learning of the morphology of a natural language
Computational Linguistics
Content based SMS spam filtering
Proceedings of the 2006 ACM symposium on Document engineering
Improving statistical MT through morphological analysis
HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
Feature engineering for mobile (SMS) spam filtering
SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
A usability comparison of three alternative message formats for an SMS banking service
International Journal of Human-Computer Studies
Evaluation of preprocessing techniques for chief complaint classification
Journal of Biomedical Informatics
The impact of mobile telephony on developing country micro-enterprise: A nigerian case study
Information Technologies and International Development
Collecting and evaluating speech recognition corpora for nine Southern Bantu languages
AfLaT '09 Proceedings of the First Workshop on Language Technologies for African Languages
The SAWA corpus: a parallel corpus English - Swahili
AfLaT '09 Proceedings of the First Workshop on Language Technologies for African Languages
AfLaT '09 Proceedings of the First Workshop on Language Technologies for African Languages
Normalizing SMS: are two metaphors better than one?
COLING '08 Proceedings of the 22nd International Conference on Computational Linguistics - Volume 1
NAACL '09 Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Bayesian unsupervised word segmentation with nested Pitman-Yor language modeling
ACL '09 Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 1 - Volume 1
Labeled LDA: a supervised topic model for credit attribution in multi-labeled corpora
EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 1 - Volume 1
Subword and spatiotemporal models for identifying actionable information in Haitian Kreyol
CoNLL '11 Proceedings of the Fifteenth Conference on Computational Natural Language Learning
Short message communications: users, topics, and in-language processing
Proceedings of the 2nd ACM Symposium on Computing for Development
Accurate unsupervised joint named-entity extraction from unaligned parallel text
NEWS '12 Proceedings of the 4th Named Entity Workshop
A Dispatch-Mediated Communication Model for Emergency Response Systems
ACM Transactions on Management Information Systems (TMIS)
Crowdsourcing and the crisis-affected community
Information Retrieval
Hi-index | 0.00 |
For millions of people in less resourced regions of the world, text messages (SMS) provide the only regular contact with their doctor. Classifying messages by medical labels supports rapid responses to emergencies, the early identification of epidemics and everyday administration, but challenges include text-brevity, rich morphology, phonological variation, and limited training data. We present a novel system that addresses these, working with a clinic in rural Malawi and texts in the Chichewa language. We show that modeling morphological and phonological variation leads to a substantial average gain of F=0.206 and an error reduction of up to 63.8% for specific labels, relative to a baseline system optimized over word-sequences. By comparison, there is no significant gain when applying the same system to the English translations of the same texts/labels, emphasizing the need for subword modeling in many languages. Language independent morphological models perform as accurately as language specific models, indicating a broad deployment potential.