Cross-lingual geo-parsing for non-structured data

Authors:
Judith Gelernter;Wei Zhang
Affiliations:
Carnegie Mellon University;Carnegie Mellon University
Venue:
Proceedings of the 7th Workshop on Geographic Information Retrieval
Year:
2013

Citing 10
Cited 0

Qualitative geocoding of persistent web pages

Proceedings of the 16th ACM SIGSPATIAL international conference on Advances in geographic information systems
Using the web for language independent spellchecking and autocorrection

EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 2 - Volume 2
Geographic signatures for semantic retrieval

Proceedings of the 6th Workshop on Geographic Information Retrieval
Microblogging during two natural hazards events: what twitter may contribute to situational awareness

Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
Harnessing the Crowdsourcing Power of Social Media for Disaster Relief

IEEE Intelligent Systems
An event-centric model for multilingual document similarity

Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
Subword and spatiotemporal models for identifying actionable information in Haitian Kreyol

CoNLL '11 Proceedings of the Fifteenth Conference on Computational Natural Language Learning
Expanding a multilingual media monitoring and information extraction tool to a new language: Swahili

Language Resources and Evaluation
A survey of methods to ease the development of highly multilingual text mining applications

Language Resources and Evaluation
An algorithm for local geoparsing of microtext

Geoinformatica

Quantified Score

Hi-index	0.00

Visualization

Abstract

A geo-parser automatically identifies location words in a text. We have generated a geo-parser specifically to find locations in unstructured Spanish text. Our novel geo-parser architecture combines the results of four parsers: a lexico-semantic Named Location Parser, a rules-based building parser, a rules-based street parser, and a trained Named Entity Parser. Each parser has different strengths: the Named Location Parser is strong in recall, and the Named Entity Parser is strong in precision, and building and street parser finds buildings and streets that the others are not designed to do. To test our Spanish geo-parser performance, we compared the output of Spanish text through our Spanish geo-parser, with that same Spanish text translated into English and run through our English geo-parser. The results were that the Spanish geo-parser identified toponyms with an F1 of .796, and the English geo-parser identified toponyms with an F1 of .861 (and this is despite errors introduced by translation from Spanish to English), compared to an F1 of .114 from a commercial off-the-shelf Spanish geo-parser. Results suggest (1) geo-parsers should be built specifically for unstructured text, as have our Spanish and English geo-parsers, and (2) location entities in Spanish that have been machine translated to English are robust to geo-parsing in English.