Modeling of stylistic variation in social media with stretchy patterns

Authors:
Philip Gianfortoni;David Adamson;Carolyn P. Rosé
Affiliations:
Carnegie Mellon University, Pittsburgh, PA;Carnegie Mellon University, Pittsburgh, PA;Carnegie Mellon University, Pittsburgh, PA
Venue:
DIALECTS '11 Proceedings of the First Workshop on Algorithms and Resources for Modelling of Dialects and Language Varieties
Year:
2011

Citing 12
Cited 2

Gender-Preferential Text Mining of E-mail Discourse

ACSAC '02 Proceedings of the 18th Annual Computer Security Applications Conference
Feature-rich part-of-speech tagging with a cyclic dependency network

NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1
Learning Subjective Language

Computational Linguistics
Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)

Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)
Hierarchical Bayesian domain adaptation

NAACL '09 Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Identifying types of claims in online customer reviews

NAACL-Short '09 Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Companion Volume: Short Papers
Generalizing dependency features for opinion mining

ACLShort '09 Proceedings of the ACL-IJCNLP 2009 Conference Short Papers
Gender difference analysis of political web forums: an experiment on an international Islamic women's forums

ISI'09 Proceedings of the 2009 IEEE international conference on Intelligence and security informatics
Using feature construction to avoid large feature spaces in text classification

Proceedings of the 12th annual conference on Genetic and evolutionary computation
Exploiting domain and task regularities for robust named entity recognition

Exploiting domain and task regularities for robust named entity recognition
Sentiment classification using automatically extracted subgraph features

CAAGET '10 Proceedings of the NAACL HLT 2010 Workshop on Computational Approaches to Analysis and Generation of Emotion in Text
Improving gender classification of blog authors

EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing

Understanding participant behavior trajectories in online health support groups using automatic extraction methods

Proceedings of the 17th ACM international conference on Supporting group work
Detecting offensive tweets via topical feature discovery over a large scale twitter corpus

Proceedings of the 21st ACM international conference on Information and knowledge management

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper we describe a novel feature discovery technique that can be used to model stylistic variation in sociolects. While structural features offer much in terms of expressive power over simpler features used more frequently in machine learning approaches to modeling linguistic variation, they frequently come at an excessive cost in terms of feature space size expansion. We propose a novel form of structural features referred to as "stretchy patterns" that strike a balance between expressive power and compactness in order to enable modeling stylistic variation with reasonably small datasets. As an example we focus on the problem of modeling variation related to gender in personal blogs. Our evaluation demonstrates a significant improvement over standard baselines.