Boosting-based ensemble learning with penalty profiles for automatic Thai unknown word recognition

  • Authors:
  • Jakkrit TeCho;Cholwich Nattee;Thanaruk Theeramunkong

  • Affiliations:
  • -;-;-

  • Venue:
  • Computers & Mathematics with Applications
  • Year:
  • 2012

Quantified Score

Hi-index 0.09

Visualization

Abstract

A boosting-based ensemble learning can be used to improve classification accuracy by using multiple classification models constructed to cope with errors obtained from their preceding steps. This paper proposes a method to improve boosting-based ensemble learning with penalty profiles via an application of automatic unknown word recognition in Thai language. Treating a sequential problem as a non-sequential problem, the unknown word recognition is required to include a process to rank a set of generated candidates for a potential unknown word position. To strengthen the recognition process with ensemble classification, the penalty profiles are defined to make it more efficient to construct a succeeding classification model which tends to re-rank a set of ranked candidates into a suitable order. As an evaluation, a number of alternative penalty profiles are introduced and their performances are compared for the task of extracting unknown words from a large Thai medical text. Using the Naive Bayes as the base classifier for ensemble learning, the proposed method with the best setting achieves an accuracy of 90.19%, which is an accuracy gap of 12.88, 10.59, and 6.05 over conventional Naive Bayes, non-ensemble version, and the flat-penalty profile.