Design and implementation of an ontology algorithm for web documents classification

Authors:
Guiyi Wei;Jun Yu;Yun Ling;Jun Liu
Affiliations:
Zhejiang Gongshang University, Hangzhou, P. R. China;Zhejiang Gongshang University, Hangzhou, P. R. China;Zhejiang Gongshang University, Hangzhou, P. R. China;Zhejiang Gongshang University, Hangzhou, P. R. China
Venue:
ICCSA'06 Proceedings of the 2006 international conference on Computational Science and Its Applications - Volume Part IV
Year:
2006

Citing 8
Cited 0

Text categorization for multiple users based on semantic features from a machine-readable dictionary

ACM Transactions on Information Systems (TOIS)
Support-Vector Networks

Machine Learning
Embedding knowledge in Web documents

WWW '99 Proceedings of the eighth international conference on World Wide Web
Ontology-focused crawling of Web documents

Proceedings of the 2003 ACM symposium on Applied computing
Web page classification without the web page

Proceedings of the 13th international World Wide Web conference on Alternate track papers & posters
An adaptive k-nearest neighbor text categorization strategy

ACM Transactions on Asian Language Information Processing (TALIP)
Generating page clippings from web search results using a dynamically terminated genetic algorithm

Information Systems
A comprehensive framework for building multilingual domain ontologies: creating a prototype biosecurity ontology

DCMI '02 Proceedings of the 2002 international conference on Dublin core and metadata applications: Metadata for e-communities: supporting diversity and convergence

Quantified Score

Hi-index	0.00

Visualization

Abstract

Traditional methods of documents classification need characteristic abstraction and classifier training. The work of collecting trainable text terms is laborious and time-consuming. Additionally, it is difficult to abstract the characteristics from Chinese documents. In order to solve the problem, this paper proposes an ontology-based approach to improve the efficiency and effectiveness of web documents classification and retrieval. Firstly, the approach establishes an ontology model based on Hownet[6] kownledge base and its method. Then, it creates ontologies for each subclass of the classification system. It uses RDFS to convert Hownet into ontology and to define the relations among ontologies. The web documents classification is performed automatically using the ontology relevance calculating algorithm. Comparing with the method of KNN[2], the results of our experiments indicate that the accuracy of ontology-based approach is close to KNN, its algorithms is more robust than KNN, and its recalling rate is better than KNN.