Improving the multilingual user experience of Wikipedia using cross-language name search

Authors:
Raghavendra Udupa;Mitesh Khapra
Affiliations:
Microsoft Research India, Bangalore, India;Indian Institute of Technology Bombay, Powai, India
Venue:
HLT '10 Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Year:
2010

Citing 20
Cited 5

An optimal algorithm for approximate nearest neighbor searching fixed dimensions

Journal of the ACM (JACM)
Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Transliteration of proper names in cross-language applications

Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Machine transliteration

Computational Linguistics
Foundations of Multidimensional and Metric Data Structures (The Morgan Kaufmann Series in Computer Graphics and Geometric Modeling)

Foundations of Multidimensional and Metric Data Structures (The Morgan Kaufmann Series in Computer Graphics and Geometric Modeling)
Indexing mixed types for approximate retrieval

VLDB '05 Proceedings of the 31st international conference on Very large data bases
Canonical Correlation Analysis: An Overview with Application to Learning Methods

Neural Computation
A geometric view on bilingual lexicon extraction from comparable corpora

ACL '04 Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics
Named entity transliteration with comparable corpora

ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Learning transliteration lexicons from the web

ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Learning a spelling error model from search query logs

HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
Named entity transliteration and discovery from multilingual comparable corpora

HLT-NAACL '06 Proceedings of the main conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics
"They Are Out There, If You Know Where to Look": Mining Transliterations of OOV Query Terms for Cross-Language Information Retrieval

ECIR '09 Proceedings of the 31th European Conference on IR Research on Advances in Information Retrieval
MINT: a method for effective and scalable mining of named entity transliterations from large comparable corpora

EACL '09 Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics
Transliteration as constrained optimization

EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Learning phoneme mappings for transliteration without parallel data

NAACL '09 Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Learning better transliterations

Proceedings of the 18th ACM conference on Information and knowledge management
Report of NEWS 2009 machine transliteration shared task

NEWS '09 Proceedings of the 2009 Named Entities Workshop: Shared Task on Transliteration
Improving transliteration accuracy using word-origin detection and lexicon lookup

NEWS '09 Proceedings of the 2009 Named Entities Workshop: Shared Task on Transliteration
Computing information retrieval performance measures efficiently in the presence of tied scores

ECIR'08 Proceedings of the IR research, 30th European conference on Advances in information retrieval

Multilingual people search

Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
Multilingual schema matching for Wikipedia infoboxes

Proceedings of the VLDB Endowment
Learning hash functions for cross-view similarity search

IJCAI'11 Proceedings of the Twenty-Second international joint conference on Artificial Intelligence - Volume Volume Two
Entity clustering across languages

NAACL HLT '12 Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
A Multi-View Embedding Space for Modeling Internet Images, Tags, and Their Semantics

International Journal of Computer Vision

Quantified Score

Hi-index	0.00

Visualization

Abstract

Although Wikipedia has emerged as a powerful collaborative Encyclopedia on the Web, it is only partially multilingual as most of the content is in English and a small number of other languages. In real-life scenarios, non-English users in general and ESL/EFL users in particular, have a need to search for relevant English Wikipedia articles as no relevant articles are available in their language. The multilingual experience of such users can be significantly improved if they could express their information need in their native language while searching for English Wikipedia articles. In this paper, we propose a novel cross-language name search algorithm and employ it for searching English Wikipedia articles in a diverse set of languages including Hebrew, Hindi, Russian, Kannada, Bangla and Tamil. Our empirical study shows that the multilingual experience of users is significantly improved by our approach.