A conceptual density-based approach for the disambiguation of toponyms

  • Authors:
  • Davide Buscaldi;Paulo Rosso

  • Affiliations:
  • Universidad Politécnica de Valencia, 46022 Valencia, Spain;Universidad Politécnica de Valencia, 46022 Valencia, Spain

  • Venue:
  • International Journal of Geographical Information Science
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

Nowadays, a huge quantity of information is stored in digital format. A great portion of this information is constituted by textual and unstructured documents, where geographical references are usually given by means of place names. A common problem with textual information retrieval is represented by polysemous words, that is, words can have more than one sense. This problem is present also in the geographical domain: place names may refer to different locations in the world. In this paper we investigate the use of our word sense disambiguation technique in the geographical domain, with the aim of resolving ambiguous place names. Our technique is based on WordNet conceptual density. Due to the lack of a reference corpus tagged with WordNet senses, we carried out the experiments over a set of 1,210 place names extracted from the SemCor corpus that we named GeoSemCor and made publicly available. We compared our method with the most-frequent baseline and the enhanced-Lesk method, which previously has not been tested in large contexts. The results show that a better precision can be achieved by using a small context (phrase level), whereas a greater coverage can be obtained by using large contexts (document level). The proposed method should be tested with other corpora, due to the fact that our experiments evidenced the excessive bias towards the most-frequent sense of the GeoSemCor.