Postal Address Detection fromWeb Documents

Authors:
Lin Can;Zhang Qian;Meng Xiaofeng;Liu Wenyin
Affiliations:
School of Information, Renmin University Beijing, PRC;School of Information, Renmin University Beijing, PRC;School of Information, Renmin University Beijing, PRC;Department of Computer Science, City University of Hong Kong Tat Chee Avenue, Hong Kong SAR, PRC
Venue:
WIRI '05 Proceedings of the International Workshop on Challenges in Web Information Retrieval and Integration
Year:
2005

Citing 0
Cited 5

Pattern-based extraction of addresses from web page content

APWeb'08 Proceedings of the 10th Asia-Pacific web conference on Progress in WWW research and development
Automatic web page annotation with google rich snippets

OTM'10 Proceedings of the 2010 international conference on On the move to meaningful internet systems: Part II
Hybrid method for automated news content extraction from the web

WISE'06 Proceedings of the 7th international conference on Web Information Systems
Automated extraction of hit numbers from search result pages

WAIM '06 Proceedings of the 7th international conference on Advances in Web-Age Information Management
A reverse engineering approach for automatic annotation of Web pages

Multimedia Tools and Applications

Quantified Score

Hi-index	0.00

Visualization

Abstract

An approach to postal address detection from webpages is proposed. The webpages are first segmented into text blocks based on their visual similarity. The text content in each block undergoes the recognition process, which employs a syntactic approach. The grammars of almost all possible patterns of postal addresses are built for this purpose. The results of our preliminary experiments on 44 webpages with 56 true addresses show that our approach can detect the postal addresses with a high precision (89.3%) and a low false alarms rate (3.8%).