Automatic acquisition of lexical formality

Authors:
Julian Brooke;Tong Wang;Graeme Hirst
Affiliations:
University of Toronto;University of Toronto;University of Toronto
Venue:
COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics: Posters
Year:
2010

Citing 10
Cited 2

Pragmatics and natural language generation

Artificial Intelligence
Measuring praise and criticism: Inference of semantic orientation from association

ACM Transactions on Information Systems (TOIS)
Automatic detection of text genre

ACL '98 Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and Eighth Conference of the European Chapter of the Association for Computational Linguistics
Recognizing text genres with simple metrics using discriminant analysis

COLING '94 Proceedings of the 15th conference on Computational linguistics - Volume 2
Collaborative Authoring on the Web: A Genre Analysis of Online Encyclopedias

HICSS '05 Proceedings of the Proceedings of the 38th Annual Hawaii International Conference on System Sciences (HICSS'05) - Track 4 - Volume 04
Building and Using a Lexical Knowledge Base of Near-Synonym Differences

Computational Linguistics
Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)

Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)
Semi-supervised polarity lexicon induction

EACL '09 Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics
Automatic satire detection: are you having a laugh?

ACLShort '09 Proceedings of the ACL-IJCNLP 2009 Conference Short Papers
SWITCHBOARD: telephone speech corpus for research and development

ICASSP'92 Proceedings of the 1992 IEEE international conference on Acoustics, speech and signal processing - Volume 1

Informality judgment at sentence level and experiments with formality score

CICLing'11 Proceedings of the 12th international conference on Computational linguistics and intelligent text processing - Volume Part II
Building readability lexicons with unannotated corpora

PITR '12 Proceedings of the First Workshop on Predicting and Improving Text Readability for target reader populations

Quantified Score

Hi-index	0.00

Visualization

Abstract

There has been relatively little work focused on determining the formality level of individual lexical items. This study applies information from large mixed-genre corpora, demonstrating that significant improvement is possible over simple word-length metrics, particularly when multiple sources of information, i.e. word length, word counts, and word association, are integrated. Our best hybrid system reaches 86% accuracy on an English near-synonym formality identification task, and near perfect accuracy when comparing words with extreme formality differences. We also test our word association method in Chinese, a language where word length is not an appropriate metric for formality.