Studying the effects of noisy text on text mining applications

  • Authors:
  • Lipika Dey;S. K. Mirajul Haque

  • Affiliations:
  • TCS Innovation Lab, Delhi, India;TCS Innovation Lab, Delhi, India

  • Venue:
  • Proceedings of The Third Workshop on Analytics for Noisy Unstructured Text Data
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

Text mining aims at deriving high quality information from text in an automated way. Text mining applications rely on Natural Language Processing (NLP) tools like tagger, parser etc. to locate and retrieve relevant information in an application specific manner. Most of these NLP tools however have been designed to work on clean and grammatically correct text. Presently, many organizations are interested to derive information from informally written text that is generated as a result of human communication through emails, or blog posts, web-based reviews etc. These texts are highly noisy and often found to contain mixture of languages. In this study we present some analysis on how noise introduced due to incorrect English affects the performance of some of the NLP tools and thereafter the text mining applications. The text mining application that we focus on is opinion mining. Opinion mining is the most significant text mining application that has to deal with noisy text generated in an unregulated fashion by users.