Short message communications: users, topics, and in-language processing

Authors:
Robert Munro;Christopher D. Manning
Affiliations:
Stanford University, Stanford, CA;Stanford University, Stanford, CA
Venue:
Proceedings of the 2nd ACM Symposium on Computing for Development
Year:
2012

Citing 17
Cited 1

Content based SMS spam filtering

Proceedings of the 2006 ACM symposium on Document engineering
Feature engineering for mobile (SMS) spam filtering

SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
A usability comparison of three alternative message formats for an SMS banking service

International Journal of Human-Computer Studies
Investigation and modeling of the structure of texting language

International Journal on Document Analysis and Recognition
The impact of mobile telephony on developing country micro-enterprise: A nigerian case study

Information Technologies and International Development
Normalizing SMS: are two metaphors better than one?

COLING '08 Proceedings of the 22nd International Conference on Computational Linguistics - Volume 1
An unsupervised model for text message normalization

CALC '09 Proceedings of the Workshop on Computational Approaches to Linguistic Creativity
Microblogging during two natural hazards events: what twitter may contribute to situational awareness

Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
Short text classification in twitter to improve information filtering

Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
Streaming first story detection with application to Twitter

HLT '10 Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Subword variation in text message classification

HLT '10 Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics
A hybrid rule/model-based finite-state framework for normalizing SMS messages

ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
"Voluntweeters": self-organizing by digital volunteers in times of crisis

Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
Insertion, deletion, or substitution?: normalizing text messages without pre-categorization nor supervision

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: short papers - Volume 2
Subword and spatiotemporal models for identifying actionable information in Haitian Kreyol

CoNLL '11 Proceedings of the Fifteenth Conference on Computational Natural Language Learning
Email formality in the workplace: a case study on the Enron corpus

LSM '11 Proceedings of the Workshop on Languages in Social Media
Crisis MT: developing a cookbook for MT in crisis situations

WMT '11 Proceedings of the Sixth Workshop on Statistical Machine Translation

Crowdsourcing and the crisis-affected community

Information Retrieval

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper investigates three dimensions of cross-domain analysis for humanitarian information processing: citizen reporting vs organizational reporting; Twitter vs SMS; and English vs non-English communications. Short messages sent during the response to the recent earthquake in Haiti and floods in Pakistan are analyzed. It is clear that SMS and Twitter were used very differently at the time, by different groups of people. SMS was primarily used by individuals on the ground while Twitter was primarily used by the international community. Turning to semi-automated strategies that employ natural language processing, it is found that English-optimal strategies do not carry over to Urdu or Kreyol, especially with regards to subword variation. Looking at machine-learning models that attempt to combine both Twitter and SMS, it is found that the cross-domain prediction accuracy is very poor, but some loss in accuracy can be overcome by learning prior distributions over the sources. It is concluded that there is only limited utility in treating SMS and Twitter as equivalent information sources -- perhaps much less than the relatively large number of recent Twitter-focused papers would indicate.