Boosting-based ensemble learning with penalty profiles for automatic Thai unknown word recognition

Authors:
Jakkrit TeCho;Cholwich Nattee;Thanaruk Theeramunkong
Affiliations:
-;-;-
Venue:
Computers & Mathematics with Applications
Year:
2012

Citing 32
Cited 0

Original Contribution: Stacked generalization

Neural Networks
Decision Combination in Multiple Classifier Systems

IEEE Transactions on Pattern Analysis and Machine Intelligence
Optimal combinations of pattern classifiers

Pattern Recognition Letters
Bagging predictors

Machine Learning
Discovering Chinese words from unsegmented text (poster abstract)

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Effective foreign word extration for Korean information retrieval

Information Processing and Management: an International Journal
An Empirical Comparison of Voting Classification Algorithms: Bagging, Boosting, and Variants

Machine Learning
Stacking Bagged and Dagged Models

ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
A decision-theoretic generalization of on-line learning and an application to boosting

EuroCOLT '95 Proceedings of the Second European Conference on Computational Learning Theory
Ensemble Methods in Machine Learning

MCS '00 Proceedings of the First International Workshop on Multiple Classifier Systems
Categorizing unknown words: using decision trees to identify names and misspellings

ANLC '00 Proceedings of the sixth conference on Applied natural language processing
A new method of N-gram statistics for large number of n and automatic extraction of words and phrases from large text data of Japanese

COLING '94 Proceedings of the 15th conference on Computational linguistics - Volume 1
Word identification for Mandarin Chinese sentences

COLING '92 Proceedings of the 14th conference on Computational linguistics - Volume 1
The automatic extraction of open compounds from text corpora

COLING '96 Proceedings of the 16th conference on Computational linguistics - Volume 2
Unknown word extraction for Chinese documents

COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 1
Experiments with AdaBoost.RT, an improved boosting scheme for regression

Neural Computation
Using Correlation to Improve Boosting Technique: An Application for Time Series Forecasting

ICTAI '06 Proceedings of the 18th IEEE International Conference on Tools with Artificial Intelligence
Adaptive Chinese word segmentation

ACL '04 Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics
Japanese unknown word identification by character-based chunking

COLING '04 Proceedings of the 20th international conference on Computational Linguistics
A collaborative framework for collecting Thai unknown words from the web

COLING-ACL '06 Proceedings of the COLING/ACL on Main conference poster sessions
Combining Bagging, Boosting and Dagging for Classification Problems

KES '07 Knowledge-Based Intelligent Information and Engineering Systems and the XVII Italian Workshop on Neural Networks on Proceedings of the 11th International Conference
A Generalized Approach to Word Segmentation Using Maximum Length Descending Frequency and Entropy Rate

CICLing '07 Proceedings of the 8th International Conference on Computational Linguistics and Intelligent Text Processing
A Corpus-Based Approach for Automatic Thai Unknown Word Recognition using Ensemble Learning Techniques

PAKDD '09 Proceedings of the 13th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining
Stacked generalization: when does it work?

IJCAI'97 Proceedings of the Fifteenth international joint conference on Artifical intelligence - Volume 2
Constructing diverse classifier ensembles using artificial training examples

IJCAI'03 Proceedings of the 18th international joint conference on Artificial intelligence
Towards a theoretical framework for ensemble classification

IJCAI'03 Proceedings of the 18th international joint conference on Artificial intelligence
Word segmentation standard in Chinese, Japanese and Korean

ALR7 Proceedings of the 7th Workshop on Asian Language Resources
ISTI@SemEval-2 task #8: Boosting-based multiway relation classification

SemEval '10 Proceedings of the 5th International Workshop on Semantic Evaluation
Boosting a multi-linear classifier with application to visual lip reading

Expert Systems with Applications: An International Journal
Handling unknown words in statistical latent-variable parsing models for Arabic, English and French

SPMRL '10 Proceedings of the NAACL HLT 2010 First Workshop on Statistical Parsing of Morphologically-Rich Languages
Using unknown word techniques to learn known words

EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
Bagging model trees for classification problems

PCI'05 Proceedings of the 10th Panhellenic conference on Advances in Informatics

Quantified Score

Hi-index	0.09

Visualization

Abstract

A boosting-based ensemble learning can be used to improve classification accuracy by using multiple classification models constructed to cope with errors obtained from their preceding steps. This paper proposes a method to improve boosting-based ensemble learning with penalty profiles via an application of automatic unknown word recognition in Thai language. Treating a sequential problem as a non-sequential problem, the unknown word recognition is required to include a process to rank a set of generated candidates for a potential unknown word position. To strengthen the recognition process with ensemble classification, the penalty profiles are defined to make it more efficient to construct a succeeding classification model which tends to re-rank a set of ranked candidates into a suitable order. As an evaluation, a number of alternative penalty profiles are introduced and their performances are compared for the task of extracting unknown words from a large Thai medical text. Using the Naive Bayes as the base classifier for ensemble learning, the proposed method with the best setting achieves an accuracy of 90.19%, which is an accuracy gap of 12.88, 10.59, and 6.05 over conventional Naive Bayes, non-ensemble version, and the flat-penalty profile.