Experiments with geo-filtering predicates for IR

Authors:
Jochen L. Leidner
Affiliations:
Linguit GmbH, Bad Bergzabern, Germany
Venue:
CLEF'05 Proceedings of the 6th international conference on Cross-Language Evalution Forum: accessing Multilingual Information Repositories
Year:
2005

Citing 6
Cited 3

Disambiguating Geographic Names in a Historical Digital Library

ECDL '01 Proceedings of the 5th European Conference on Research and Advanced Technology for Digital Libraries
Web-a-where: geotagging web content

Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Lucene in Action (In Action series)

Lucene in Action (In Action series)
Language independent NER using a maximum entropy tagger

CONLL '03 Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003 - Volume 4
Grounding spatial named entities for information extraction and question answering

HLT-NAACL-GEOREF '03 Proceedings of the HLT-NAACL 2003 workshop on Analysis of geographic references - Volume 1
GeoCLEF: the CLEF 2005 cross-language geographic information retrieval track overview

CLEF'05 Proceedings of the 6th international conference on Cross-Language Evalution Forum: accessing Multilingual Information Repositories

Geographic intention and modification in web search

International Journal of Geographical Information Science
Challenges for indexing in GIR

SIGSPATIAL Special
Every document has a geographical scope

Data & Knowledge Engineering

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper describes a set of experiments for monolingual English retrieval at Geo-CLEF 2005, evaluating a technique for spatial retrieval based on named entity tagging, toponym resolution, and re-ranking by means of geographic filtering. To this end, a series of systematic experiments in the Vector Space paradigm are presented. Plain bag-of-words versus phrasal retrieval and the potential of meronymy query expansion as a recall-enhancing device are investigated, and three alternative geo-spatial filtering techniques based on spatial clipping are compared and evaluated on 25 monolingual English queries. Preliminary results show that always choosing toponym referents based on a simple “maximum population” heuristic to approximate the salience of a referent fails to outperform TF*IDF baselines with the Geo-CLEF 2005 dataset when combined with three geo-filtering predicates. Conservative geo-filtering outperforms more aggressive predicates. The evidence further seems to suggest that query expansion with WordNet meronyms is not effective in combination with the method described. A post-hoc analysis indicates that responsible factors for the low performance include sparseness of available population data, gaps in the gazetteer that associates Minimum Bounding Rectangles with geo-terms in the query, and the composition of the Geo-CLEF 2005 dataset itself.