Ensemble Methods in Machine Learning
MCS '00 Proceedings of the First International Workshop on Multiple Classifier Systems
COLING '94 Proceedings of the 15th conference on Computational linguistics - Volume 1
A collaborative framework for collecting Thai unknown words from the web
COLING-ACL '06 Proceedings of the COLING/ACL on Main conference poster sessions
Boosting-based ensemble learning with penalty profiles for automatic Thai unknown word recognition
Computers & Mathematics with Applications
Hi-index | 0.00 |
This paper presents a corpus-based approach for automatic unknown word recognition in Thai. This approach applies an ensemble learning technique to generate a model for classifying unknown word candidates using features obtained from a corpus. We propose a technique called "group-based evaluation by ranking". It clusters the unknown word candidates into groups based on the occuring locations. The candidate with the highest accuracy is then identified as an unknown word. In this task, the number of positive instances is dominantly smaller than that of negative instances, forming an unbalanced data set. To improve the prediction accuracy, we apply a boosting technique with "voting under group-based evaluation by ranking". We have conducted experiments on real-world data to evaluate the performance of the proposed approach. The experiments compared the accuracy of our technique with an ordinary naïve Bayes technique. Our technique achieves the accuracy 90.93±0.50% when the first rank is selected and 97.90±0.26% when the candidates up to the tenth rank are considered. This is 6.79% to 8.45% improvement.