Revisiting the predictability of language: response completion in social media

Authors:
Bo Pang;Sujith Ravi
Affiliations:
Yahoo! Research, Santa Clara, CA;Yahoo! Research, Santa Clara, CA
Venue:
EMNLP-CoNLL '12 Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning
Year:
2012

Citing 14
Cited 0

An estimate of an upper bound for the entropy of English

Computational Linguistics
Entropy of English text: experiments with humans and a machine learning system based on rough sets

Information Sciences: an International Journal - From rough sets to soft computing
ELIZA—a computer program for the study of natural language communication between man and machine

Communications of the ACM
Text input for mobile devices: comparing model prediction to actual performance

Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
The entropy of English using PPM-based models

DCC '96 Proceedings of the Conference on Data Compression
Latent dirichlet allocation

The Journal of Machine Learning Research
Getting more mileage from web text sources for conversational speech language modeling using class-dependent mixtures

NAACL-Short '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology: companion volume of the Proceedings of HLT-NAACL 2003--short papers - Volume 2
Modeling local coherence: an entity-based approach

ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
Entropy of search logs: how hard is search? with personalization? with backoff?

WSDM '08 Proceedings of the 2008 International Conference on Web Search and Data Mining
Predictive text input in a mobile shopping assistant: methods and interface design

Proceedings of the 14th international conference on Intelligent user interfaces
A Comparative Study of Utilizing Topic Models for Information Retrieval

ECIR '09 Proceedings of the 31th European Conference on IR Research on Advances in Information Retrieval
Data-driven response generation in social media

EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Zipf's law and entropy (Corresp.)

IEEE Transactions on Information Theory
A convergent gambling estimate of the entropy of English

IEEE Transactions on Information Theory

Quantified Score

Hi-index	0.00

Visualization

Abstract

The question "how predictable is English?" has long fascinated researchers. While prior work has focused on formal English typically used in news articles, we turn to texts generated by users in online settings that are more informal in nature. We are motivated by a novel application scenario: given the difficulty of typing on mobile devices, can we help reduce typing effort with message completion, especially in conversational settings? We propose a method for automatic response completion. Our approach models both the language used in responses and the specific context provided by the original message. Our experimental results on a large-scale dataset show that both components help reduce typing effort. We also perform an information-theoretic study in this setting and examine the entropy of user-generated content, especially in conversational scenarios, to better understand predictability of user generated English.