Using WordNet to disambiguate word senses for text retrieval
SIGIR '93 Proceedings of the 16th annual international ACM SIGIR conference on Research and development in information retrieval
A translation approach to portable ontology specifications
Knowledge Acquisition - Special issue: Current issues in knowledge modeling
Using explicit ontologies in KBS development
International Journal of Human-Computer Studies
Computer
WI '01 Proceedings of the First Asia-Pacific Conference on Web Intelligence: Research and Development
Ontologies Improve Text Document Clustering
ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
An intelligent search agent system for semantic information retrieval on the internet
WIDM '03 Proceedings of the 5th ACM international workshop on Web information and data management
CIT'09 Proceedings of the 3rd International Conference on Communications and information technology
ISPRA'10 Proceedings of the 9th WSEAS international conference on Signal processing, robotics and automation
Hi-index | 0.00 |
In this paper, we deal with the problem of analyzing and classifying web documents in a given domain by information filtering agents. We present the ontology-based web content mining methodology that contains such main stages as creation of ontology for the specified domain, collecting a training set of labeled documents, building a classification model in this domain using the constructed ontology and a classification algorithm, and classification of new documents by information agents via the induced model. We evaluated the proposed methodology in two specific domains: the chemical domain (web pages containing information about production of certain chemicals), and Yahoo! collection of web news documents divided into several categories. Our system receives as input the domain-specific ontology, and a set of categorized web documents, and then perfroms concept generalization on these documents. We use a key-phrase extractor with integrated ontology parser for creating a database from input documents and use it as a training set for the classification algorithm. The system classification accuracy is estimated using various levels of ontology.