Hindi CLIR in thirty days

Authors:
Leah S. Larkey;Margaret E. Connell;Nasreen Abduljaleel
Affiliations:
University of Massachusetts, Amherst, MA;University of Massachusetts, Amherst, MA;University of Massachusetts, Amherst, MA
Venue:
ACM Transactions on Asian Language Information Processing (TALIP)
Year:
2003

Citing 12
Cited 8

TREC and TIPSTER experiments with INQUERY

TREC-2 Proceedings of the second conference on Text retrieval conference
The effects of query structure and dictionary setups in dictionary-based cross-language information retrieval

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Resolving ambiguity for cross-language retrieval

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Information retrieval as statistical translation

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Evaluating a probabilistic model for cross-lingual information retrieval

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Improving stemming for Arabic information retrieval: light stemming and co-occurrence analysis

SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
On arabic search: improving the retrieval effectiveness via a light stemming approach

Proceedings of the eleventh international conference on Information and knowledge management
Revised Papers from the Second Workshop of the Cross-Language Evaluation Forum on Evaluation of Cross-Language Information Retrieval Systems

CLEF '01 Revised Papers from the Second Workshop of the Cross-Language Evaluation Forum on Evaluation of Cross-Language Information Retrieval Systems
Statistical transliteration for english-arabic cross language information retrieval

CIKM '03 Proceedings of the twelfth international conference on Information and knowledge management
A month to topic detection and tracking in Hindi

ACM Transactions on Asian Language Information Processing (TALIP)
Structured queries, language modeling, and relevance modeling in cross-language information retrieval

Information Processing and Management: an International Journal - Special issue: Cross-language information retrieval
Improved statistical alignment models

ACL '00 Proceedings of the 38th Annual Meeting on Association for Computational Linguistics

A month to topic detection and tracking in Hindi

ACM Transactions on Asian Language Information Processing (TALIP)
WebKhoj: Indian language IR from multiple character encodings

Proceedings of the 15th international conference on World Wide Web
Stemming to improve translation lexicon creation form bitexts

Information Processing and Management: an International Journal
An unsupervised Hindi stemmer with heuristic improvements

Proceedings of the second workshop on Analytics for noisy unstructured text data
Transliteration based search engine for multilingual information access

CLIAWS3 '09 Proceedings of the Third International Workshop on Cross Lingual Information Access: Addressing the Information Need of Multilingual Societies
Sub-Word Indexing and Blind Relevance Feedback for English, Bengali, Hindi, and Marathi IR

ACM Transactions on Asian Language Information Processing (TALIP)
A Fast Corpus-Based Stemmer

ACM Transactions on Asian Language Information Processing (TALIP)
Hindi, telugu, oromo, english CLIR evaluation

CLEF'06 Proceedings of the 7th international conference on Cross-Language Evaluation Forum: evaluation of multilingual and multi-modal information retrieval

Quantified Score

Hi-index	0.00

Visualization

Abstract

As participants in the TIDES Surprise language exercise, researchers at the University of Massachusetts helped collect Hindi--English resources and developed a cross-language information retrieval system. Components included normalization, stop-word removal, transliteration, structured query translation, and language modeling using a probabilistic dictionary derived from a parallel corpus. Existing technology was successfully applied to Hindi. The biggest stumbling blocks were collection of parallel English and Hindi text and dealing with numerous proprietary encodings.