PADL '03 Proceedings of the 5th International Symposium on Practical Aspects of Declarative Languages
WCRE '01 Proceedings of the Eighth Working Conference on Reverse Engineering (WCRE'01)
A functional toolkit for morphological and phonological processing, application to a Sanskrit tagger
Journal of Functional Programming
Design of a lexical database for Sanskrit
ElectricDict '04 Proceedings of the Workshop on Enhancing and Using Electronic Dictionaries
Formal Structure of Sanskrit Text: Requirements Analysis for a Mechanical Sanskrit Processor
Sanskrit Computational Linguistics
Analysis of Sanskrit Text: Parsing and Semantic Relations
Sanskrit Computational Linguistics
SanskritTagger: A Stochastic Lexical and POS Tagger for Sanskrit
Sanskrit Computational Linguistics
Hi-index | 0.01 |
We present the state of the art of a computational platform for the analysis of classical Sanskrit. The platform comprises modules for phonology, morphology, segmentation and shallow syntax analysis, organized around a structured lexical database. It relies on the Zen toolkit for finite state automata and transducers, which provides data structures and algorithms for the modular construction and execution of finite state machines, in a functional framework. Some of the layers proceed in bottom-up synthesis mode - for instance, noun and verb morphological modules generate all inflected forms from stems and roots listed in the lexicon. Morphemes are assembled through internal sandhi, and the inflected forms are stored with morphological tags in dictionaries usable for lemmatizing. These dictionaries are then compiled into transducers, implementing the analysis of external sandhi, the phonological process which merges words together by euphony. This provides a tagging segmenter, which analyses a sentence presented as a stream of phonemes and produces a stream of tagged lexical entries, hyperlinked to the lexicon. The next layer is a syntax analyser, guided by semantic nets constraints expressing dependencies between the word forms. Finite verb forms demand semantic roles, according to valency patterns depending on the voice (active, passive) of the form and the governance (transitive, etc) of the root. Conversely, noun/adjective forms provide actors which may fill those roles, provided agreement constraints are satisfied. Tool words are mapped to transducers operating on tagged streams, allowing the modeling of linguistic phenomena such as coordination by abstract interpretation of actor streams. The parser ranks the various interpretations (matching actors with roles) with penalties, and returns to the user the minimum penalty analyses, for final validation of ambiguities. The whole platform is organized as a Web service, allowing the piecewise tagging of a Sanskrit text.