Classification of web documents using concept extraction from ontologies

Authors:
Marina Litvak;Mark Last;Slava Kisilevich
Affiliations:
Department of Information Systems Engineering, Ben-Gurion University of the Negev, Beer-Sheva, Israel;Department of Information Systems Engineering, Ben-Gurion University of the Negev, Beer-Sheva, Israel;Department of Information Systems Engineering, Ben-Gurion University of the Negev, Beer-Sheva, Israel
Venue:
AIS-ADM'07 Proceedings of the 2nd international conference on Autonomous intelligent systems: agents and data mining
Year:
2007

Citing 7
Cited 2

Using WordNet to disambiguate word senses for text retrieval

SIGIR '93 Proceedings of the 16th annual international ACM SIGIR conference on Research and development in information retrieval
A translation approach to portable ontology specifications

Knowledge Acquisition - Special issue: Current issues in knowledge modeling
Using explicit ontologies in KBS development

International Journal of Human-Computer Studies
In Search of the Wisdom Web

Computer
Web Intelligence (WI)

WI '01 Proceedings of the First Asia-Pacific Conference on Web Intelligence: Research and Development
Ontologies Improve Text Document Clustering

ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
An intelligent search agent system for semantic information retrieval on the internet

WIDM '03 Proceedings of the 5th ACM international workshop on Web information and data management

A new approach for better document retrieval and classification performance using supervised WSD and Concept graph

CIT'09 Proceedings of the 3rd International Conference on Communications and information technology
A new approach for better document retrieval and classification performance using supervised WSD and concept graph

ISPRA'10 Proceedings of the 9th WSEAS international conference on Signal processing, robotics and automation

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper, we deal with the problem of analyzing and classifying web documents in a given domain by information filtering agents. We present the ontology-based web content mining methodology that contains such main stages as creation of ontology for the specified domain, collecting a training set of labeled documents, building a classification model in this domain using the constructed ontology and a classification algorithm, and classification of new documents by information agents via the induced model. We evaluated the proposed methodology in two specific domains: the chemical domain (web pages containing information about production of certain chemicals), and Yahoo! collection of web news documents divided into several categories. Our system receives as input the domain-specific ontology, and a set of categorized web documents, and then perfroms concept generalization on these documents. We use a key-phrase extractor with integrated ontology parser for creating a database from input documents and use it as a training set for the classification algorithm. The system classification accuracy is estimated using various levels of ontology.