Multimodal sn,k-grams: a skipping-based similarity model in information retrieval

  • Authors:
  • Pakinee Aimmanee;Thanaruk Theeramunkong

  • Affiliations:
  • Sirindhorn International Institute of Technology, Thammasat University, Patumthani, Thailand;Sirindhorn International Institute of Technology, Thammasat University, Patumthani, Thailand

  • Venue:
  • ACIIDS'10 Proceedings of the Second international conference on Intelligent information and database systems: Part I
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

A generalization of n-gram term modeling, namely sn,k-gram, has been recently proposed by allowing k-term skipping in the n-gram representation. This paper presents a so-called multi-modal sn,k- gram similarity which combines multiple similarity vectors resulting from computing similarity between several pairs of queries and documents each of which using s-grams with various n and k. Adjusting weights in the combination enables us to create a suitable approximate matching model between a relevant document and a query although such document does not include any exact terms as in the query or vice versa. To evaluate our proposed method, we analyzed two variants of a multimodal sn,k-gram model, called equal-weighting and performance-based-weighting over all queries on two collections of medical documents that are alike in context but different in written languages. The result shows that the multimodal sn,k-gram similarity significantly outperforms the conventional unigrams and bigrams.