Extraction of Address Data from Unstructured Text using Free Knowledge Resources

  • Authors:
  • Sebastian Schmidt;Simon Manschitz;Christoph Rensing;Ralf Steinmetz

  • Affiliations:
  • Multimedia Communications Lab, Technische Universität, Darmstadt, Germany;Multimedia Communications Lab, Technische Universität, Darmstadt, Germany;Multimedia Communications Lab, Technische Universität, Darmstadt, Germany;Multimedia Communications Lab, Technische Universität, Darmstadt, Germany

  • Venue:
  • Proceedings of the 13th International Conference on Knowledge Management and Knowledge Technologies
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

The Web is populated with many Web sites containing unstructured textual information. These Web sites are a source of knowledge for various interests. As semantic annotations are only rarely used on Web sites, an automated harvesting of the knowledge without additional effort is not possible. Thus, elaborated approaches for information extraction are required. In our work we face the challenge of identifying business address data on Web sites since we see the need for this data in various applications. In order to accomplish our aim, we have developed a hybrid approach combining patterns and gazetteers obtained from freely available knowledge resources such as OpenStreetMap. Experimental evaluation on a corpus of heterogeneous Web sites shows a high recall and precision. The approach can be adapted for identification of addresses considering the different formats in various countries.