Crawling the web with OntoDir

  • Authors:
  • Antonio Picariello;Antonio M. Rinaldi

  • Affiliations:
  • Universitá di Napoli Federico II, Dipartimento di Informatica e Sistemistica, Napoli, Italy;Universitá di Napoli Federico II, Dipartimento di Informatica e Sistemistica, Napoli, Italy

  • Venue:
  • DEXA'07 Proceedings of the 18th international conference on Database and Expert Systems Applications
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

Managing large amount of information on the internet needs more efficient and effective methods and techniques for mining and representing information. The use of ontologies for knowledge representation has had a fast increase in the last years: in fact the use of a common and formal representation of knowledge allows a more accurate analysis of a number of documents content, in several contexts. One of these challenging applications is the Web: the World Wide Web, in fact, has nowadays those kinds of requirements which are hard to satisfy, especially when one considers a complex scenario as the Semantic Web. In this paper we present a methodology for automatic topic annotation of Web pages. We describe an algorithm for words disambiguation using an apposite metric for measuring the semantic relatedness and we show a technique which allows to detect the topic of the analyzed document by means of ontologies extracted from a knowledge base. The strategy is implemented in a system where these information are taken into account to build a topic hierarchy automatically created and not a priori defined. Experimental results are presented and discussed in order to measure the effectiveness of our approach.