Gender-Preferential Text Mining of E-mail Discourse
ACSAC '02 Proceedings of the 18th Annual Computer Security Applications Conference
Feature-rich part-of-speech tagging with a cyclic dependency network
NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1
Computational Linguistics
Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)
Hierarchical Bayesian domain adaptation
NAACL '09 Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Identifying types of claims in online customer reviews
NAACL-Short '09 Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Companion Volume: Short Papers
Generalizing dependency features for opinion mining
ACLShort '09 Proceedings of the ACL-IJCNLP 2009 Conference Short Papers
ISI'09 Proceedings of the 2009 IEEE international conference on Intelligence and security informatics
Using feature construction to avoid large feature spaces in text classification
Proceedings of the 12th annual conference on Genetic and evolutionary computation
Exploiting domain and task regularities for robust named entity recognition
Exploiting domain and task regularities for robust named entity recognition
Sentiment classification using automatically extracted subgraph features
CAAGET '10 Proceedings of the NAACL HLT 2010 Workshop on Computational Approaches to Analysis and Generation of Emotion in Text
Improving gender classification of blog authors
EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
Proceedings of the 17th ACM international conference on Supporting group work
Detecting offensive tweets via topical feature discovery over a large scale twitter corpus
Proceedings of the 21st ACM international conference on Information and knowledge management
Hi-index | 0.00 |
In this paper we describe a novel feature discovery technique that can be used to model stylistic variation in sociolects. While structural features offer much in terms of expressive power over simpler features used more frequently in machine learning approaches to modeling linguistic variation, they frequently come at an excessive cost in terms of feature space size expansion. We propose a novel form of structural features referred to as "stretchy patterns" that strike a balance between expressive power and compactness in order to enable modeling stylistic variation with reasonably small datasets. As an example we focus on the problem of modeling variation related to gender in personal blogs. Our evaluation demonstrates a significant improvement over standard baselines.