Text filtering by boosting naive Bayes classifiers

Authors:
Yu-Hwan Kim;Shang-Yoon Hahn;Byoung-Tak Zhang
Affiliations:
Artificial Intelligence Lab (SCAI), School of Computer Science and Engineering, Seoul National University, Seoul 151-742, Korea;Artificial Intelligence Lab (SCAI), School of Computer Science and Engineering, Seoul National University, Seoul 151-742, Korea;Artificial Intelligence Lab (SCAI), School of Computer Science and Engineering, Seoul National University, Seoul 151-742, Korea
Venue:
SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
Year:
2000

Citing 12
Cited 19

Information filtering and information retrieval: two sides of the same coin?

Communications of the ACM - Special issue on information filtering
Evaluating and optimizing autonomous text classification systems

SIGIR '95 Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval
Optimization of relevance feedback weights

SIGIR '95 Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval
Bagging predictors

Machine Learning
Pivoted document length normalization

SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval
Training algorithms for linear text classifiers

SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval
Boosting and Rocchio applied to text filtering

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
A re-examination of text categorization methods

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Improved Boosting Algorithms Using Confidence-rated Predictions

Machine Learning - The Eleventh Annual Conference on computational Learning Theory
A Probabilistic Analysis of the Rocchio Algorithm with TFIDF for Text Categorization

ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
Employing EM and Pool-Based Active Learning for Text Classification

ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
Bagging, boosting, and C4.S

AAAI'96 Proceedings of the thirteenth national conference on Artificial intelligence - Volume 1

Maximum likelihood estimation for filtering thresholds

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Machine learning in automated text categorization

ACM Computing Surveys (CSUR)
Information Filtering: Overview of Issues, Research and Systems

User Modeling and User-Adapted Interaction
A refinement approach to handling model misfit in text categorization

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Co-trained support vector machines for large scale unstructured document classification using unlabeled data and syntactic information

Information Processing and Management: an International Journal
Location Based and Customized Voice Information Service for Mobile Community

Information Systems Frontiers
Incremental profile learning based on a reinforcement method

Proceedings of the 2005 ACM symposium on Applied computing
Adaptive sampling for thresholding in document filtering and classification

Information Processing and Management: an International Journal
Dynamic category profiling for text filtering and classification

Information Processing and Management: an International Journal
Boosted Landmarks of Contextual Descriptors and Forest-ECOC: A novel framework to detect and classify objects in cluttered scenes

Pattern Recognition Letters
Interactive high-quality text classification

Information Processing and Management: an International Journal
Automatically identifying localizable queries

Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
WORDS AS CLASSIFIERS OF DOCUMENTS ACCORDING TO THEIR HISTORICAL PERIOD AND THE ETHNIC ORIGIN OF THEIR AUTHORS

Cybernetics and Systems
Ensembled support vector machines for human papillomavirus risk type prediction from protein secondary structures

Computers in Biology and Medicine
Traffic sign recognition using evolutionary adaboost detection and forest-ECOC classification

IEEE Transactions on Intelligent Transportation Systems
Text and hypertext categorization

Artificial intelligence
Content-based filtering in on-line social networks

PSDML'10 Proceedings of the international ECML/PKDD conference on Privacy and security issues in data mining and machine learning
Automatic indexing of news videos through text classification techniques

ICAPR'05 Proceedings of the Third international conference on Pattern Recognition and Image Analysis - Volume Part II
Text categorization using SVMs with rocchio ensemble for internet information classification

ICCNMC'05 Proceedings of the Third international conference on Networking and Mobile Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Several machine learning algorithms have recently been used for text categorization and filtering. In particular, boosting methods such as AdaBoost have shown good performance applied to real text data. However, most of existing boosting algorithms are based on classifiers that use binary-valued features. Thus, they do not fully make use of the weight information provided by standard term weighting methods. In this paper, we present a boosting-based learning method for text filtering that uses naive Bayes classifiers as a weak learner. The use of naive Bayes allows the boosting algorithm to utilize term frequency information while maintaining probabilistically accurate confidence ratio. Applied to TREC-7 and TREC-8 filtering track documents, the proposed method obtained a significant improvement in LF1, LF2, F1 and F3 measures compared to the best results submitted by other TREC entries.