Approximate string-matching with q-grams and maximal matches
Theoretical Computer Science - Selected papers of the Combinatorial Pattern Matching School
FLASH: A Fast Look-Up Algorithm for String Homology
Proceedings of the 1st International Conference on Intelligent Systems for Molecular Biology
Better filtering with gapped q-grams
Fundamenta Informaticae - Special issue on computing patterns in strings
s-grams: Defining generalized n-grams for information retrieval
Information Processing and Management: an International Journal
Comparison of s-gram Proximity Measures in Out-of-Vocabulary Word Translation
SPIRE '08 Proceedings of the 15th International Symposium on String Processing and Information Retrieval
Hi-index | 0.00 |
A generalization of n-gram term modeling, namely sn,k-gram, has been recently proposed by allowing k-term skipping in the n-gram representation. This paper presents a so-called multi-modal sn,k- gram similarity which combines multiple similarity vectors resulting from computing similarity between several pairs of queries and documents each of which using s-grams with various n and k. Adjusting weights in the combination enables us to create a suitable approximate matching model between a relevant document and a query although such document does not include any exact terms as in the query or vice versa. To evaluate our proposed method, we analyzed two variants of a multimodal sn,k-gram model, called equal-weighting and performance-based-weighting over all queries on two collections of medical documents that are alike in context but different in written languages. The result shows that the multimodal sn,k-gram similarity significantly outperforms the conventional unigrams and bigrams.