Analytic variations on the common subexpression problem
Proceedings of the seventeenth international colloquium on Automata, languages and programming
ML for the working programmer
Regular models of phonological rule systems
Computational Linguistics - Special issue on computational phonology
Deterministic part-of-speech tagging with finite-state transducers
Computational Linguistics
A stochastic finite-state word-segmentation algorithm for Chinese
Computational Linguistics
The functional approach to programming
The functional approach to programming
Fast algorithms for sorting and searching strings
SODA '97 Proceedings of the eighth annual ACM-SIAM symposium on Discrete algorithms
Proving termination with multiset orderings
Communications of the ACM
The next 700 programming languages
Communications of the ACM
Finite-State Language Processing
Finite-State Language Processing
PADL '03 Proceedings of the 5th International Symposium on Practical Aspects of Declarative Languages
Journal of Functional Programming
Journal of Functional Programming
Incremental construction of minimal acyclic finite-state automata
Computational Linguistics - Special issue on finite-state methods in NLP
Finite-state transducers in language and speech processing
Computational Linguistics
A simple rule-based part of speech tagger
ANLC '92 Proceedings of the third conference on Applied natural language processing
A general computational model for word-form recognition and production
ACL '84 Proceedings of the 10th International Conference on Computational Linguistics and 22nd annual meeting on Association for Computational Linguistics
ACL '95 Proceedings of the 33rd annual meeting on Association for Computational Linguistics
Shallow syntax analysis in Sanskrit guided by semantic nets constraints
Proceedings of the 2006 international workshop on Research issues in digital libraries
CIAA '08 Proceedings of the 13th international conference on Implementation and Applications of Automata
Natural Language Engineering
Sanskrit Computational Linguistics
Formal Structure of Sanskrit Text: Requirements Analysis for a Mechanical Sanskrit Processor
Sanskrit Computational Linguistics
Analysis of Sanskrit Text: Parsing and Semantic Relations
Sanskrit Computational Linguistics
Implementation of the Arabic numerals and their syntax in GF
Semitic '07 Proceedings of the 2007 Workshop on Computational Approaches to Semitic Languages: Common Issues and Resources
Simulating Finite Eilenberg Machines with a Reactive Engine
Electronic Notes in Theoretical Computer Science (ENTCS)
Hi-index | 0.00 |
We present the Zen toolkit for morphological and phonological processing of natural languages. This toolkit is presented in literate programming style, in the Pidgin ML subset of the Objective Caml functional programming language. This toolkit is based on a systematic representation of finite state automata and transducers as decorated lexical trees. All operations on the state space data structures use the zipper technology, and a uniform sharing functor permits systematic maximum sharing as dags. A particular case of lexical maps is specially convenient for building invertible morphological operations such as inflected forms dictionaries, using a notion of differential word. As a particular application, we describe a general method for tagging a natural language text given as a phoneme stream by analysing possible euphonic liaisons between words belonging to a lexicon of inflected forms. The method uses the toolkit methodology by constructing a non-deterministic transducer, implementing rational rewrite rules, by mechanical decoration of a trie representation of the lexicon index. The algorithm is linear in the size of the lexicon. A coroutine interpreter is given, and its correctness and completeness are formally proved. An application to the segmentation of Sanskrit by sandhi analysis is demonstrated.