Integrated scoring for spelling error correction, abbreviation expansion and case restoration in dirty text

  • Authors:
  • Wilson Wong;Wei Liu;Mohammed Bennamoun

  • Affiliations:
  • University of Western Australia, Crawley WA;University of Western Australia, Crawley WA;University of Western Australia, Crawley WA

  • Venue:
  • AusDM '06 Proceedings of the fifth Australasian conference on Data mining and analystics - Volume 61
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

An increasing number of language and speech applications are gearing towards the use of texts from online sources as input. Despite such rise, not much work can be found in the aspect of integrated approaches for cleaning dirty texts from online sources. This paper presents a mechanism of Integrated Scoring for Spelling error correction, Abbreviation expansion and Case restoration (ISSAC). The idea of ISSAC was first conceived as part of the text preprocessing phase in an ontology engineering project. Evaluations of ISSAC using 400 chat records reveal an improved accuracy of 96.5% over the existing 74.4% based on the use of Aspell only.