A pattern based two-stage text classifier

  • Authors:
  • Moch Arif Bijaksana;Yuefeng Li;Abdulmohsen Algarni

  • Affiliations:
  • School of Electrical Engineering and Computer Science, Queensland University of Technology, Brisbane, QLD, Australia,Informatics Faculty, Telkom Institute of Technology, Bandung, Indonesia;School of Electrical Engineering and Computer Science, Queensland University of Technology, Brisbane, QLD, Australia;College of Computer Science, King Khalid University, Abha, Saudi Arabia

  • Venue:
  • MLDM'13 Proceedings of the 9th international conference on Machine Learning and Data Mining in Pattern Recognition
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

In a classification problem typically we face two challenging issues, the diverse characteristic of negative documents and sometimes a lot of negative documents that are closed to positive documents. Therefore, it is hard for a single classifier to clearly classify incoming documents into classes. This paper proposes a novel gradual problem solving to create a two-stage classifier. The first stage identifies reliable negatives (negative documents with weak positive characteristics). It concentrates on minimizing the number of false negative documents (recall-oriented). We use Rocchio, an existing recall based classifier, for this stage. The second stage is a precision-oriented "fine tuning", concentrates on minimizing the number of false positive documents by applying pattern (a statistical phrase) mining techniques. In this stage a pattern-based scoring is followed by threshold setting (thresholding). Experiment shows that our statistical phrase based two-stage classifier is promising.