Employing the resolution power of search keys
Journal of the American Society for Information Science and Technology
A guided tour to approximate string matching
ACM Computing Surveys (CSUR)
Fuzzy translation of cross-lingual spelling variants
Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Automatic transliteration for Japanese-to-English text retrieval
Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Finite-state transducers in language and speech processing
Computational Linguistics
Computational Linguistics
Using the web for automated translation extraction in cross-language information retrieval
Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Machine transliteration of names in Arabic text
SEMITIC '02 Proceedings of the ACL-02 workshop on Computational approaches to semitic languages
A hybrid back-transliteration system for Japanese
COLING '04 Proceedings of the 20th international conference on Computational Linguistics
Detecting transliterated orthographic variants via two similarity metrics
COLING '04 Proceedings of the 20th international conference on Computational Linguistics
Direct orthographical mapping for machine transliteration
COLING '04 Proceedings of the 20th international conference on Computational Linguistics
Edit-distance of weighted automata
CIAA'02 Proceedings of the 7th international conference on Implementation and application of automata
Similarity of Names Across Scripts: Edit Distance Using Learned Costs of N-Grams
GoTAL '08 Proceedings of the 6th international conference on Advances in Natural Language Processing
When Harry met Harri: cross-lingual name spelling normalization
EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
String distance metrics for reference matching and search query correction
BIS'07 Proceedings of the 10th international conference on Business information systems
A probabilistic model for guessing base forms of new words by analogy
CICLing'08 Proceedings of the 9th international conference on Computational linguistics and intelligent text processing
Machine transliteration survey
ACM Computing Surveys (CSUR)
English to persian transliteration
SPIRE'06 Proceedings of the 13th international conference on String Processing and Information Retrieval
MDL-based models for transliteration generation
SLSP'13 Proceedings of the First international conference on Statistical Language and Speech Processing
Hi-index | 0.00 |
Technical term translations are important for cross-lingual information retrieval. In many languages, new technical terms have a common origin rendered with different spelling of the underlying sounds, also known as cross-lingual spelling variants (CLSV).To find the best CLSV in a text database index, we contribute a formulation of the problem in a probabilistic framework, and implement this with an instance of the general edit distance using weighted finite-state transducers. Some training data is required when estimating the costs for the general edit distance. We demonstrate that after some basic training our new multilingual model is robust and requires little or no adaptation for covering additional languages, as the model takes advantage of language independent transliteration patterns.We train the model with medical terms in seven languages and test it with terms from varied domains in six languages. Two test languages are not in the training data. Against a large text database index, we achieve 64---78 % precision at the point of 100% recall. This is a relative improvement of 22% on the simple edit distance.