Approximate string-matching with q-grams and maximal matches
Theoretical Computer Science - Selected papers of the Combinatorial Pattern Matching School
Phonetic string matching: lessons from information retrieval
SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval
Retrieval effectiveness of proper name search methods
Information Processing and Management: an International Journal
Employing the resolution power of search keys
Journal of the American Society for Information Science and Technology
Principles of data mining
CLEF '00 Revised Papers from the Workshop of Cross-Language Evaluation Forum on Cross-Language Information Retrieval and Evaluation
Fuzzy translation of cross-lingual spelling variants
Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Better filtering with gapped q-grams
Fundamenta Informaticae - Special issue on computing patterns in strings
Technical issues of cross-language information retrieval: a review
Information Processing and Management: an International Journal - Special issue: Cross-language information retrieval
s-grams: Defining generalized n-grams for information retrieval
Information Processing and Management: an International Journal
Multimodal sn,k-grams: a skipping-based similarity model in information retrieval
ACIIDS'10 Proceedings of the Second international conference on Intelligent information and database systems: Part I
Hi-index | 0.00 |
Classified s -grams have been successfully used in cross-language information retrieval (CLIR) as an approximate string matching technique for translating out-of-vocabulary (OOV) words. For example, s -grams have consistently outperformed other approximate string matching techniques, like edit distance or n -grams. The Jaccard coefficient has traditionally been used as an s -gram based string proximity measure. However, other proximity measures for s -gram matching have not been tested. In the current study the performance of seven proximity measures for classified s -grams in CLIR context was evaluated using eleven language pairs. The binary proximity measures performed generally better than their non-binary counterparts, but the difference depended mainly on the padding used with s -grams. When no padding was used, the binary and non-binary proximity measures were nearly equal, though the performance at large deteriorated.