A survey of types of text noise and techniques to handle noisy text

  • Authors:
  • L. Venkata Subramaniam;Shourya Roy;Tanveer A. Faruquie;Sumit Negi

  • Affiliations:
  • IBM India Research Lab, New Delhi, India;Xerox India Innovation Hub, Chennai, India;IBM India Research Lab, New Delhi, India;IBM India Research Lab, New Delhi, India

  • Venue:
  • Proceedings of The Third Workshop on Analytics for Noisy Unstructured Text Data
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

Often, in the real world noise is ubiquitous in text communications. Text produced by processing signals intended for human use are often noisy for automated computer processing. Automatic speech recognition, optical character recognition and machine translation all introduce processing noise. Also digital text produced in informal settings such as online chat, SMS, emails, message boards, newsgroups, blogs, wikis and web pages contain considerable noise. In this paper, we present a survey of the existing measures for noise in text. We also cover application areas that ingest this noisy text for various tasks like Information Retrieval and Information Extraction.