Semi-supervised learning of geographical gazetteers from the internet

  • Authors:
  • Olga Uryupina

  • Affiliations:
  • Saarland University, Saarbrücken, Germany

  • Venue:
  • HLT-NAACL-GEOREF '03 Proceedings of the HLT-NAACL 2003 workshop on Analysis of geographic references - Volume 1
  • Year:
  • 2003

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this paper we present an approach to the acquisition of geographical gazetteers. Instead of creating these resources manually, we propose to extract gazetteers from the World Wide Web, using Data Mining techniques.The bootstrapping approach, investigated in our study, allows us to create new gazetteers using only a small seed dataset (1260 words). In addition to gazetteers, the system produces classifiers. They can be used online to determine a class (CITY, ISLAND, RIVER, MOUNTAIN, REGION, COUNTRY) of any geographical name. Our classifiers perform with the average accuracy of 86.5%.