Arabic morphology using only finite-state operations
Semitic '98 Proceedings of the Workshop on Computational Approaches to Semitic Languages
Modeling morphologically rich languages using split words and unstructured dependencies
ACLShort '09 Proceedings of the ACL-IJCNLP 2009 Conference Short Papers
OpenFst: a general and efficient weighted finite-state transducer library
CIAA'07 Proceedings of the 12th international conference on Implementation and application of automata
Hi-index | 0.00 |
In this paper, we propose a novel method for automatic segmentation of a Sanskrit string into different words. The input for our segmentizer is a Sanskrit string either encoded as a Unicode string or as a Roman transliterated string and the output is a set of possible splits with weights associated with each of them. We followed two different approaches to segment a Sanskrit text using sandhi rules extracted from a parallel corpus of manually sandhi split text. While the first approach augments the finite state transducer used to analyze Sanskrit morphology and traverse it to segment a word, the second approach generates all possible segmentations and validates each constituent using a morph analyzer.