Effectively Searching Maps in Web Documents

Authors:
Qingzhao Tan;Prasenjit Mitra;C. Lee Giles
Affiliations:
Computer Science and Engineering, The Pennsylvania State University, University Park, USA PA 16802;Computer Science and Engineering, The Pennsylvania State University, University Park, USA PA 16802 and Information Sciences and Technology, The Pennsylvania State University, University Park, USA ...;Computer Science and Engineering, The Pennsylvania State University, University Park, USA PA 16802 and Information Sciences and Technology, The Pennsylvania State University, University Park, USA ...
Venue:
ECIR '09 Proceedings of the 31th European Conference on IR Research on Advances in Information Retrieval
Year:
2009

Citing 18
Cited 0

Statistics: concepts and applications

Statistics: concepts and applications
Understanding Diagrams in Technical Documents

Computer
Effective retrieval of structured documents

SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
Support-Vector Networks

Machine Learning
MARCO: MAp Retrieval by COntent

IEEE Transactions on Pattern Analysis and Machine Intelligence
A flexible model for retrieval of SGML documents

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Making large-scale support vector machine learning practical

Advances in kernel methods
Managing gigabytes (2nd ed.): compressing and indexing documents and images

Managing gigabytes (2nd ed.): compressing and indexing documents and images
Content-Based Image Retrieval at the End of the Early Years

IEEE Transactions on Pattern Analysis and Machine Intelligence
G-Portal: a map-based digital library for distributed geospatial and georeferenced resources

Proceedings of the 2nd ACM/IEEE-CS joint conference on Digital libraries
Information Retrieval from Documents: A Survey

Information Retrieval
Combining document representations for known-item search

Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Classification of source code archives

Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Simple BM25 extension to multiple weighted fields

Proceedings of the thirteenth ACM international conference on Information and knowledge management
Learning metadata from the evidence in an on-line citation matching scheme

Proceedings of the 6th ACM/IEEE-CS joint conference on Digital libraries
A retrospective study of a hybrid document-context based retrieval model

Information Processing and Management: an International Journal
Geographically-aware information retrieval for collections of digitized historical maps

Proceedings of the 4th ACM workshop on Geographical information retrieval
Creating a searchable map library via data mining

Proceedings of the 8th ACM/IEEE-CS joint conference on Digital libraries

Quantified Score

Hi-index	0.00

Visualization

Abstract

Maps are an important source of information in archaeology and other sciences. Users want to search for historical maps to determine recorded history of the political geography of regions at different eras, to find out where exactly archaeological artifacts were discovered, etc. Currently, they have to use a generic search engine and add the term map along with other keywords to search for maps. This crude method will generate a significant number of false positives that the user will need to cull through to get the desired results. To reduce their manual effort, we propose an automatic map identification, indexing, and retrieval system that enables users to search and retrieve maps appearing in a large corpus of digital documents using simple keyword queries. We identify features that can help in distinguishing maps from other figures in digital documents and show how a Support-Vector-Machine-based classifier can be used to identify maps. We propose map-level-metadata e.g., captions, references to the maps in text, etc. and document-level metadata, e.g., title, abstract, citations, how recent the publication is, etc. and show how they can be automatically extracted and indexed. Our novel ranking algorithm weights different metadata fields differently and also uses the document-level metadata to help rank retrieved maps. Empirical evaluations show which features should be selected and which metadata fields should be weighted more. We also demonstrate improved retrieval results in comparison to adaptations of existing methods for map retrieval. Our map search engine has been deployed in an online map-search system that is part of the Blind-Review digital library system.