Cross-linguistic information retrieval workshop
SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval
Translingual information retrieval: learning from bilingual corpora
Artificial Intelligence - Special issue: artificial intelligence 40 years later
Using clustering and SuperConcepts within SMART: TREC 6
Information Processing and Management: an International Journal - The sixth text REtrieval conference (TREC-6)
Should we translate the documents or the queries in cross-language information retrieval?
ACL '99 Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics
Cross-language information retrieval: the way ahead
Information Processing and Management: an International Journal - Special issue: Cross-language information retrieval
Technical issues of cross-language information retrieval: a review
Information Processing and Management: an International Journal - Special issue: Cross-language information retrieval
BLEU: a method for automatic evaluation of machine translation
ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Cross-language text classification
Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
An EM Based Training Algorithm for Cross-Language Text Categorization
WI '05 Proceedings of the 2005 IEEE/WIC/ACM International Conference on Web Intelligence
A Framework for Learning Predictive Structures from Multiple Tasks and Unlabeled Data
The Journal of Machine Learning Research
Exploiting comparable corpora and bilingual dictionaries for cross-language text categorization
ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Advanced learning algorithms for cross-language patent retrieval and classification
Information Processing and Management: an International Journal
Robust classification of rare queries using web knowledge
SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
A Hybrid Technique for English-Chinese Cross Language Information Retrieval
ACM Transactions on Asian Language Information Processing (TALIP)
Can chinese web pages be classified with english data source?
Proceedings of the 17th international conference on World Wide Web
Search advertising using web relevance feedback
Proceedings of the 17th ACM conference on Information and knowledge management
Domain adaptation with structural correspondence learning
EMNLP '06 Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing
Domain adaptation for statistical classifiers
Journal of Artificial Intelligence Research
Cross-lingual sentiment classification via bi-view non-negative matrix tri-factorization
PAKDD'11 Proceedings of the 15th Pacific-Asia conference on Advances in knowledge discovery and data mining - Volume Part I
Ensemble approach for cross language information retrieval
CICLing'12 Proceedings of the 13th international conference on Computational Linguistics and Intelligent Text Processing - Volume Part II
Hi-index | 0.01 |
The non-English Web is growing at phenomenal speed, but available language processing tools and resources are predominantly English-based. Taxonomies are a case in point: while there are plenty of commercial and non-commercial taxonomies for the English Web, taxonomies for other languages are either not available or of arguable quality. Given that building comprehensive taxonomies for each language is prohibitively expensive, it is natural to ask whether existing English taxonomies can be leveraged, possibly via machine translation, to enable text processing tasks in other languages. Our experimental results confirm that the answer is affirmative with respect to at least one task. In this study we focus on query classification, which is essential for understanding the user intent both in Web search and in online advertising. We propose a robust method for classifying non-English queries into an English taxonomy, using an existing English text classifier and off-the-shelf machine translation systems. In particular, we show that by considering the Web search results in the query's original language as additional sources of information, we can alleviate the effect of erroneous machine translation. Empirical evaluation on query sets in languages as diverse as Chinese and Russian yields very encouraging results; consequently, we believe that our approach is also applicable to many additional languages.