An effective way to represent quadtrees
Communications of the ACM
Global SourceBook of Address Data Management: A Guide to Address Formats and Data in 194 Countries
Global SourceBook of Address Data Management: A Guide to Address Formats and Data in 194 Countries
Robust and efficient fuzzy match for online data cleaning
Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Geographical information recognition and visualization in texts written in various languages
Proceedings of the 2004 ACM symposium on Applied computing
Computational Linguistics
Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
Machine transliteration of names in Arabic text
SEMITIC '02 Proceedings of the ACL-02 workshop on Computational approaches to semitic languages
A generic framework for machine transliteration
SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Robust location search from text queries
Proceedings of the 15th annual ACM international symposium on Advances in geographic information systems
A comparison of different machine transliteration models
Journal of Artificial Intelligence Research
ECIR '09 Proceedings of the 31th European Conference on IR Research on Advances in Information Retrieval
Proceedings of the 17th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems
Map search via a factor graph model
Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Automatic gazetteer enrichment with user-geocoded data
Proceedings of the Second ACM SIGSPATIAL International Workshop on Crowdsourced and Volunteered Geographic Information
Hi-index | 0.00 |
Address geocoding, the process of finding the map location for a structured postal address, is a relatively well-studied problem. In this paper we consider the more general problem of crosslingual location search, where the queries are not limited to postal addresses, and the language and script used in the search query is different from the one in which the underlying data is stored. To the best of our knowledge, our system is the first crosslingual location search system that is able to geocode complex addresses. We use a statistical machine transliteration system to convert location names from the script of the query to that of the stored data. However, we show that it is not sufficient to simply feed the resulting transliterations into a monolingual geocoding system, as the ambiguity inherent in the conversion drastically expands the location search space and significantly lowers the quality of results. The strength of our approach lies in its integrated, end-to-end nature: we use abstraction and fuzzy search (in the text domain) to achieve maximum coverage despite transliteration ambiguities, while applying spatial constraints (in the geographic domain) to focus only on viable interpretations of the query. Our experiments with structured and unstructured queries in a set of diverse languages and scripts (Arabic, English, Hindi and Japanese) searching for locations in different regions of the world, show full crosslingual location search accuracy at levels comparable to that of commercial monolingual systems. We achieve these levels of performance using techniques that may be applied to crosslingual searches in any language/script, and over arbitrary spatial data.