An automatic approach to classify web documents using a domain ontology

Authors:
Mu-Hee Song;Soo-Yeon Lim;Seong-Bae Park;Dong-Jin Kang;Sang-Jo Lee
Affiliations:
Dept. of Computer Engineering, Information Technology Services, Kyungpook National University, Daegu, The Korea;Dept. of Computer Engineering, Information Technology Services, Kyungpook National University, Daegu, The Korea;Dept. of Computer Engineering, Information Technology Services, Kyungpook National University, Daegu, The Korea;Dept. of Computer Engineering, Information Technology Services, Kyungpook National University, Daegu, The Korea;Dept. of Computer Engineering, Information Technology Services, Kyungpook National University, Daegu, The Korea
Venue:
PReMI'05 Proceedings of the First international conference on Pattern Recognition and Machine Intelligence
Year:
2005

Citing 10
Cited 2

Towards language independent automated learning of text categorization models

SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
Evaluating and optimizing autonomous text classification systems

SIGIR '95 Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval
Boosting and Rocchio applied to text filtering

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Automatic RDF metadata generation for resource discovery

WWW '99 Proceedings of the eighth international conference on World Wide Web
A probabilistic description-oriented approach for categorizing web documents

Proceedings of the eighth international conference on Information and knowledge management
Hierarchical classification of Web content

SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
Introduction to Modern Information Retrieval

Introduction to Modern Information Retrieval
Support Vector Machines

IEEE Intelligent Systems
Ontology-Based Automatic Classification for the Web Pages: Design, Implementation and Evaluation

WISE '02 Proceedings of the 3rd International Conference on Web Information Systems Engineering
A Binary-Categorization Approach for Classifying Multiple-Record Web Documents Using Application Ontologies and a Probabilistic Model

DASFAA '01 Proceedings of the 7th International Conference on Database Systems for Advanced Applications

An ontological website models-supported search agent for web services

Expert Systems with Applications: An International Journal
FAQ-master: an ontological multi-agent system for web FAQ services

WSEAS Transactions on Information Science and Applications

Quantified Score

Hi-index	0.01

Visualization

Abstract

This paper suggests an automated method for document classification using an ontology, which expresses terminology information and vocabulary contained in Web documents by way of a hierarchical structure. Ontologybased document classification involves determining document features that represent the Web documents most accurately, and classifying them into the most appropriate categories after analyzing their contents by using at least two pre-defined categories per given document features. In this paper, Web documents are classified in real time not with experimental data or a learning process, but by similar calculations between the terminology information extracted from Web texts and ontology categories. This results in a more accurate document classification since the meanings and relationships unique to each document are determined.