The study of informality as a framework for evaluating the normalisation of web 2.0 texts

Authors:
Alejandro Mosquera;Paloma Moreda
Affiliations:
DLSI, University of Alicante, Alicante, Spain;DLSI, University of Alicante, Alicante, Spain
Venue:
NLDB'12 Proceedings of the 17th international conference on Applications of Natural Language Processing and Information Systems
Year:
2012

Citing 7
Cited 0

The double metaphone search algorithm

C/C++ Users Journal
Language and the Internet

Language and the Internet
BLEU: a method for automatic evaluation of machine translation

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Email data cleaning

Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
A phrase-based statistical model for SMS text normalization

COLING-ACL '06 Proceedings of the COLING/ACL on Main conference poster sessions
Unsupervised modeling of Twitter conversations

HLT '10 Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Lexical normalisation of short text messages: makn sens a #twitter

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1

Quantified Score

Hi-index	0.00

Visualization

Abstract

The language used in Web 2.0 applications such as blogging platforms, realtime chats, social networks or collaborative encyclopaedias shows remarkable differences in comparison with traditional texts. The presence of informal features such as emoticons, spelling errors or Internet-specific slang can lower the performance of Natural Language Processing applications. In order to overcome this problem, text normalisation approaches can provide a clean word or sentence by transforming all non-standard lexical or syntactic variations into their canonical forms. Nevertheless, because the characteristics of each normalisation approach there exist different performance metrics and evaluation procedures. We hypothesize that the analysis of informality levels can be used to evaluate text normalization techniques. Thus, in this study we are going to propose a text normalisation evaluation framework using informality levels and its application to Web 2.0 texts.