Supervised and unsupervised learning algorithms for thai web pages identification

  • Authors:
  • Boonserm Kijsirikul;Puay Sasiphongpairoege;Nuanwan Soonthornphisaj;Surapant Meknavin

  • Affiliations:
  • Department of Computer Engineering, Chulalongkorn University, Bangkok, Thailand;Department of Computer Engineering, Chulalongkorn University, Bangkok, Thailand;Department of Computer Engineering, Chulalongkorn University, Bangkok, Thailand;Siamguru Co.,Ltd., Bangkok, Thailand

  • Venue:
  • PRICAI'00 Proceedings of the 6th Pacific Rim international conference on Artificial intelligence
  • Year:
  • 2000

Quantified Score

Hi-index 0.00

Visualization

Abstract

The paper presents a learning method, called iterative crosstraining (ICT) for identifying Thai Web pages. Our method combines two classifiers, i.e. a word segmentation classifier and a naive Bayes classifier, that use unlabeled examples to train each other. We compare ICT against other supervised and unsupervised learning methods: a supervised word segmentation classifier (S-Word), a supervised naive Bayes classifier (S-Bayes), an unsupervised naive Bayes classifier using the EM algorithm (U-Bayes-EM), and a co-training-style classifier (CoTraining). The experimental results show that ICT gives the best performance, followed by S-Bayes, CoTraining U-Bayes-EM and S-Word.