Topical categorization of search results based on a domain ontology
ICTIR'11 Proceedings of the Third international conference on Advances in information retrieval theory
Hi-index | 0.00 |
The process of web document classification involves calculating similarities between documents and categories by using the information extracted from them. In recent years, ontology-based web documents classification method is introduced to solve the problem of classifier training and not considering semantic relations between words in traditional Machine Learning algorithms. However, previous works on ontology-based web documents classification miss some important issues of automatic ontology construction and ranking of classified documents. In order to solve these problems, this paper proposes an ontology-based web documents classification and ranking method. Firstly, weighted terms set are extracted from web documents, and ontology is build up by using an effective ontology construction method which clarifies and augments an existent ontology; then similarity score between documents and ontology is computed based on WordNet by using Earth Mover's Distance (EMD) method; finally, web documents are assigned to categories according to the similarity score, and a simple ranking method is used to sort the documents in the same categories. The experiment result shows our classification algorithm achieves better precision and recall compare with adaptive KNN method, and is competitive with SVM method, the ranking method also has good performance.