Modeling latent biographic attributes in conversational genres

Authors:
Nikesh Garera;David Yarowsky
Affiliations:
Johns Hopkins University, Human Language Technology Center of Excellence, Baltimore MD;Johns Hopkins University, Human Language Technology Center of Excellence, Baltimore MD
Venue:
ACL '09 Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 2 - Volume 2
Year:
2009

Citing 1
Cited 9

A quantitative analysis of lexical differences between genders in telephone conversations

ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics

Towards conversation entailment: an empirical investigation

EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
Classifying latent user attributes in twitter

SMUC '10 Proceedings of the 2nd international workshop on Search and mining user-generated contents
Which clustering do you want? inducing your ideal clustering with minimal feedback

Journal of Artificial Intelligence Research
Personalised rating prediction for new users using latent factor models

Proceedings of the 22nd ACM conference on Hypertext and hypermedia
Author age prediction from text using linear regression

LaTeCH '11 Proceedings of the 5th ACL-HLT Workshop on Language Technology for Cultural Heritage, Social Sciences, and Humanities
Streaming analysis of discourse participants

EMNLP-CoNLL '12 Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning
User demographics and language in an implicit social network

EMNLP-CoNLL '12 Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning
"TweetGenie: automatic age prediction from tweets" by D. Nguyen, R. Gravel, D. Trieschnigg, and T. Meder; with Ching-man Au Yeung as coordinator

ACM SIGWEB Newsletter
Recognition of understanding level and language skill using measurements of reading behavior

Proceedings of the 19th international conference on Intelligent User Interfaces

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper presents and evaluates several original techniques for the latent classification of biographic attributes such as gender, age and native language, in diverse genres (conversation transcripts, email) and languages (Arabic, English). First, we present a novel partner-sensitive model for extracting biographic attributes in conversations, given the differences in lexical usage and discourse style such as observed between same-gender and mixed-gender conversations. Then, we explore a rich variety of novel sociolinguistic and discourse-based features, including mean utterance length, passive/active usage, percentage domination of the conversation, speaking rate and filler word usage. Cumulatively up to 20% error reduction is achieved relative to the standard Boulis and Ostendorf (2005) algorithm for classifying individual conversations on Switchboard, and accuracy for gender detection on the Switchboard corpus (aggregate) and Gulf Arabic corpus exceeds 95%.