Address-Block Extraction by Bayesian Rule
ICPR '04 Proceedings of the Pattern Recognition, 17th International Conference on (ICPR'04) Volume 2 - Volume 02
Retrieving address-based locations from the web
Proceedings of the 2nd international workshop on Geographic information retrieval
Pattern-based extraction of addresses from web page content
APWeb'08 Proceedings of the 10th Asia-Pacific web conference on Progress in WWW research and development
Context-aware and multilingual information extraction for a tourist recommender system
i-KNOW '11 Proceedings of the 11th International Conference on Knowledge Management and Knowledge Technologies
Address extraction: extraction of location-based information from the web
APWeb'05 Proceedings of the 7th Asia-Pacific web conference on Web Technologies Research and Development
Hi-index | 0.00 |
The Web is populated with many Web sites containing unstructured textual information. These Web sites are a source of knowledge for various interests. As semantic annotations are only rarely used on Web sites, an automated harvesting of the knowledge without additional effort is not possible. Thus, elaborated approaches for information extraction are required. In our work we face the challenge of identifying business address data on Web sites since we see the need for this data in various applications. In order to accomplish our aim, we have developed a hybrid approach combining patterns and gazetteers obtained from freely available knowledge resources such as OpenStreetMap. Experimental evaluation on a corpus of heterogeneous Web sites shows a high recall and precision. The approach can be adapted for identification of addresses considering the different formats in various countries.