A pattern based two-stage text classifier

Authors:
Moch Arif Bijaksana;Yuefeng Li;Abdulmohsen Algarni
Affiliations:
School of Electrical Engineering and Computer Science, Queensland University of Technology, Brisbane, QLD, Australia,Informatics Faculty, Telkom Institute of Technology, Bandung, Indonesia;School of Electrical Engineering and Computer Science, Queensland University of Technology, Brisbane, QLD, Australia;College of Computer Science, King Khalid University, Abha, Saudi Arabia
Venue:
MLDM'13 Proceedings of the 9th international conference on Machine Learning and Data Mining in Pattern Recognition
Year:
2013

Citing 20
Cited 0

An evaluation of phrasal and clustered representations on a text categorization task

SIGIR '92 Proceedings of the 15th annual international ACM SIGIR conference on Research and development in information retrieval
A comparison of classifiers and document representations for the routing problem

SIGIR '95 Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval
Boosting and Rocchio applied to text filtering

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Evaluating evaluation measure stability

SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
An Evaluation of Statistical Approaches to Text Categorization

Information Retrieval
A study of thresholding strategies for text categorization

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Maximum likelihood estimation for filtering thresholds

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Machine learning in automated text categorization

ACM Computing Surveys (CSUR)
Text Categorization with Suport Vector Machines: Learning with Many Relevant Features

ECML '98 Proceedings of the 10th European Conference on Machine Learning
Building a filtering test collection for TREC 2002

Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Mining Ontology for Automatically Acquiring Web User Information Needs

IEEE Transactions on Knowledge and Data Engineering
Text Classification Improved through Automatically Extracted Sequences

ICDE '06 Proceedings of the 22nd International Conference on Data Engineering
Deploying Approaches for Pattern Refinement in Text Mining

ICDM '06 Proceedings of the Sixth International Conference on Data Mining
Introduction to Information Retrieval

Introduction to Information Retrieval
An Extensive Empirical Study of Feature Selection for Text Categorization

ICIS '08 Proceedings of the Seventh IEEE/ACIS International Conference on Computer and Information Science (icis 2008)
A two-stage text mining model for information filtering

Proceedings of the 17th ACM conference on Information and knowledge management
Search Engines: Information Retrieval in Practice

Search Engines: Information Retrieval in Practice
Multilabel classification with meta-level features

Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
Mining positive and negative patterns for relevance feature discovery

Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining
A two-stage decision model for information filtering

Decision Support Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

In a classification problem typically we face two challenging issues, the diverse characteristic of negative documents and sometimes a lot of negative documents that are closed to positive documents. Therefore, it is hard for a single classifier to clearly classify incoming documents into classes. This paper proposes a novel gradual problem solving to create a two-stage classifier. The first stage identifies reliable negatives (negative documents with weak positive characteristics). It concentrates on minimizing the number of false negative documents (recall-oriented). We use Rocchio, an existing recall based classifier, for this stage. The second stage is a precision-oriented "fine tuning", concentrates on minimizing the number of false positive documents by applying pattern (a statistical phrase) mining techniques. In this stage a pattern-based scoring is followed by threshold setting (thresholding). Experiment shows that our statistical phrase based two-stage classifier is promising.