Content based SMS spam filtering
Proceedings of the 2006 ACM symposium on Document engineering
Feature engineering for mobile (SMS) spam filtering
SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
A usability comparison of three alternative message formats for an SMS banking service
International Journal of Human-Computer Studies
Investigation and modeling of the structure of texting language
International Journal on Document Analysis and Recognition
The impact of mobile telephony on developing country micro-enterprise: A nigerian case study
Information Technologies and International Development
Normalizing SMS: are two metaphors better than one?
COLING '08 Proceedings of the 22nd International Conference on Computational Linguistics - Volume 1
An unsupervised model for text message normalization
CALC '09 Proceedings of the Workshop on Computational Approaches to Linguistic Creativity
Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
Short text classification in twitter to improve information filtering
Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
Streaming first story detection with application to Twitter
HLT '10 Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Subword variation in text message classification
HLT '10 Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics
A hybrid rule/model-based finite-state framework for normalizing SMS messages
ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
"Voluntweeters": self-organizing by digital volunteers in times of crisis
Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: short papers - Volume 2
Subword and spatiotemporal models for identifying actionable information in Haitian Kreyol
CoNLL '11 Proceedings of the Fifteenth Conference on Computational Natural Language Learning
Email formality in the workplace: a case study on the Enron corpus
LSM '11 Proceedings of the Workshop on Languages in Social Media
Crisis MT: developing a cookbook for MT in crisis situations
WMT '11 Proceedings of the Sixth Workshop on Statistical Machine Translation
Crowdsourcing and the crisis-affected community
Information Retrieval
Hi-index | 0.00 |
This paper investigates three dimensions of cross-domain analysis for humanitarian information processing: citizen reporting vs organizational reporting; Twitter vs SMS; and English vs non-English communications. Short messages sent during the response to the recent earthquake in Haiti and floods in Pakistan are analyzed. It is clear that SMS and Twitter were used very differently at the time, by different groups of people. SMS was primarily used by individuals on the ground while Twitter was primarily used by the international community. Turning to semi-automated strategies that employ natural language processing, it is found that English-optimal strategies do not carry over to Urdu or Kreyol, especially with regards to subword variation. Looking at machine-learning models that attempt to combine both Twitter and SMS, it is found that the cross-domain prediction accuracy is very poor, but some loss in accuracy can be overcome by learning prior distributions over the sources. It is concluded that there is only limited utility in treating SMS and Twitter as equivalent information sources -- perhaps much less than the relatively large number of recent Twitter-focused papers would indicate.