Extraction of Address Data from Unstructured Text using Free Knowledge Resources

Authors:
Sebastian Schmidt;Simon Manschitz;Christoph Rensing;Ralf Steinmetz
Affiliations:
Multimedia Communications Lab, Technische Universität, Darmstadt, Germany;Multimedia Communications Lab, Technische Universität, Darmstadt, Germany;Multimedia Communications Lab, Technische Universität, Darmstadt, Germany;Multimedia Communications Lab, Technische Universität, Darmstadt, Germany
Venue:
Proceedings of the 13th International Conference on Knowledge Management and Knowledge Technologies
Year:
2013

Citing 5
Cited 0

Address-Block Extraction by Bayesian Rule

ICPR '04 Proceedings of the Pattern Recognition, 17th International Conference on (ICPR'04) Volume 2 - Volume 02
Retrieving address-based locations from the web

Proceedings of the 2nd international workshop on Geographic information retrieval
Pattern-based extraction of addresses from web page content

APWeb'08 Proceedings of the 10th Asia-Pacific web conference on Progress in WWW research and development
Context-aware and multilingual information extraction for a tourist recommender system

i-KNOW '11 Proceedings of the 11th International Conference on Knowledge Management and Knowledge Technologies
Address extraction: extraction of location-based information from the web

APWeb'05 Proceedings of the 7th Asia-Pacific web conference on Web Technologies Research and Development

Quantified Score

Hi-index	0.00

Visualization

Abstract

The Web is populated with many Web sites containing unstructured textual information. These Web sites are a source of knowledge for various interests. As semantic annotations are only rarely used on Web sites, an automated harvesting of the knowledge without additional effort is not possible. Thus, elaborated approaches for information extraction are required. In our work we face the challenge of identifying business address data on Web sites since we see the need for this data in various applications. In order to accomplish our aim, we have developed a hybrid approach combining patterns and gazetteers obtained from freely available knowledge resources such as OpenStreetMap. Experimental evaluation on a corpus of heterogeneous Web sites shows a high recall and precision. The approach can be adapted for identification of addresses considering the different formats in various countries.