BayesTH-MCRDR algorithm for automatic classification of web document

Authors:
Woo-Chul Cho;Debbie Richards
Affiliations:
Department of Computing, Macquarie University, Sydney, NSW, Australia;Department of Computing, Macquarie University, Sydney, NSW, Australia
Venue:
AI'04 Proceedings of the 17th Australian joint conference on Advances in Artificial Intelligence
Year:
2004

Citing 6
Cited 1

Incremental clustering for dynamic information processing

ACM Transactions on Information Systems (TOIS)
Formal Concept Analysis: Mathematical Foundations

Formal Concept Analysis: Mathematical Foundations
A Comparative Study on Feature Selection in Text Categorization

ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
A Probabilistic Analysis of the Rocchio Algorithm with TFIDF for Text Categorization

ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
Validating knowledge acquisition: multiple classification ripple-down rules

Validating knowledge acquisition: multiple classification ripple-down rules
Feature selection and feature extraction for text categorization

HLT '91 Proceedings of the workshop on Speech and Natural Language

Identification of common methods used for ontology integration tasks

Proceedings of the first international workshop on Interoperability of heterogeneous information systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Nowadays, automated Web document classification is considered as an important method to manage and process an enormous amount of Web documents in digital forms that are extensive and constantly increasing Recently, document classification has been addressed with various classified techniques such as naïve Bayesian, TFIDF (Term Frequency Inverse Document Frequency), FCA (Formal Concept Analysis) and MCRDR (Multiple Classification Ripple Down Rules) We suggest the BayesTH-MCRDR algorithm for useful new Web document classification in this paper We offer a composite algorithm that combines a naïve Bayesian algorithm using Threshold and the MCRDR algorithm The prominent feature of the BayesTH-MCRDR algorithm is optimisation of the initial relationship between keywords before final assignment to a category in order to get higher document classification accuracy We also present the system we have developed in order to demonstrate and compare a number of classification techniques.