A late fusion approach to cross-lingual document re-ranking

Authors:
Dong Zhou;Séamus Lawless;Jinming Min;Vincent Wade
Affiliations:
Trinity College Dublin, Dublin, Ireland;Trinity College Dublin, Dublin, Ireland;Dublin City University, Dublin, Ireland;Trinity College Dublin, Dublin, Ireland
Venue:
CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
Year:
2010

Citing 6
Cited 3

The vocabulary problem in human-system communication

Communications of the ACM
A Hybrid Technique for English-Chinese Cross Language Information Retrieval

ACM Transactions on Asian Language Information Processing (TALIP)
Explicit versus latent concept models for cross-language information retrieval

IJCAI'09 Proceedings of the 21st international jont conference on Artifical intelligence
Latent document re-ranking

EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 3 - Volume 3
A Wikipedia-based multilingual retrieval model

ECIR'08 Proceedings of the IR research, 30th European conference on Advances in information retrieval
Dual-space re-ranking model for document retrieval

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics: Posters

The linear combination data fusion method in information retrieval

DEXA'11 Proceedings of the 22nd international conference on Database and expert systems applications - Volume Part II
Multilingual adaptive search for digital libraries

TPDL'11 Proceedings of the 15th international conference on Theory and practice of digital libraries: research and advanced technology for digital libraries
Linear combination of component results in information retrieval

Data & Knowledge Engineering

Quantified Score

Hi-index	0.00

Visualization

Abstract

The field of information retrieval still strives to develop models which allow semantic information to be integrated in the ranking process to improve performance in comparison to standard bag-of-words based models. Cross-lingual information retrieval is an example of where such a model is required, as content or concepts often need to be matched across languages. To overcome this problem, a conceptual model has been adopted in ranking an entire corpus which normally exploits latent/implicit features of the text. One of the drawbacks of this model is that the computational cost is significant and often intractable in modern test collections. Therefore, approaches utilizing concept-based models for re-ranking initial retrieval results have attracted a considerable amount of study, in particular the latent concept model. However, fitting such a model to a smaller collection is less meaningful than fitting it into the whole corpus. This paper proposes a late fusion method which incorporates scores generated by using external knowledge to enhance the space produced by the latent concept method. This method is further demonstrated to be suitable for multilingual re-ranking purposes. To illustrate the effectiveness of the proposed method, experiments were conducted over test collections across three languages. The results demonstrate that the method can comfortably achieve improvements in retrieval performance over several re-ranking methods.