Crawling the web with OntoDir

Authors:
Antonio Picariello;Antonio M. Rinaldi
Affiliations:
Universitá di Napoli Federico II, Dipartimento di Informatica e Sistemistica, Napoli, Italy;Universitá di Napoli Federico II, Dipartimento di Informatica e Sistemistica, Napoli, Italy
Venue:
DEXA'07 Proceedings of the 18th international conference on Database and Expert Systems Applications
Year:
2007

Citing 11
Cited 0

Automatic text processing: the transformation, analysis, and retrieval of information by computer

Automatic text processing: the transformation, analysis, and retrieval of information by computer
A translation approach to portable ontology specifications

Knowledge Acquisition - Special issue: Current issues in knowledge modeling
WordNet: a lexical database for English

Communications of the ACM
Bringing order to the Web: automatically categorizing search results

Proceedings of the SIGCHI conference on Human Factors in Computing Systems
Hierarchical classification of Web content

SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
Automatic Web Page Classification in a Dynamic and Hierarchical Way

ICDM '02 Proceedings of the 2002 IEEE International Conference on Data Mining
An Approach for Measuring Semantic Similarity between Words Using Multiple Information Sources

IEEE Transactions on Knowledge and Data Engineering
Combining link-based and content-based methods for web document classification

CIKM '03 Proceedings of the twelfth international conference on Information and knowledge management
Liveclassifier: creating hierarchical text classifiers through web corpora

Proceedings of the 13th international conference on World Wide Web
A Fuzzy Classification Based on Feature Selection for Web Pages

WI '04 Proceedings of the 2004 IEEE/WIC/ACM International Conference on Web Intelligence
Automatic Classification of Web Pages based on the Concept of Domain Ontology

APSEC '05 Proceedings of the 12th Asia-Pacific Software Engineering Conference

Quantified Score

Hi-index	0.00

Visualization

Abstract

Managing large amount of information on the internet needs more efficient and effective methods and techniques for mining and representing information. The use of ontologies for knowledge representation has had a fast increase in the last years: in fact the use of a common and formal representation of knowledge allows a more accurate analysis of a number of documents content, in several contexts. One of these challenging applications is the Web: the World Wide Web, in fact, has nowadays those kinds of requirements which are hard to satisfy, especially when one considers a complex scenario as the Semantic Web. In this paper we present a methodology for automatic topic annotation of Web pages. We describe an algorithm for words disambiguation using an apposite metric for measuring the semantic relatedness and we show a technique which allows to detect the topic of the analyzed document by means of ontologies extracted from a knowledge base. The strategy is implemented in a system where these information are taken into account to build a topic hierarchy automatically created and not a priori defined. Experimental results are presented and discussed in order to measure the effectiveness of our approach.