A fast finite-state relaxation method for enforcing global constraints on sequence decoding
HLT-NAACL '06 Proceedings of the main conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics
xfst2fsa: comparing two finite-state toolboxes
Software '05 Proceedings of the Workshop on Software
A method for compiling two-level rules with multiple contexts
SIGMORPHON '10 Proceedings of the 11th Meeting of the ACL Special Interest Group on Computational Morphology and Phonology
Compiling simple context restrictions with nondeterministic automata
FSMNLP '11 Proceedings of the 9th International Workshop on Finite State Methods and Natural Language Processing
Hi-index | 0.00 |
This thesis is concerned with the use of a logical language for specifying mappings between strings of symbols; specifically, the regular relations , those which can be computed by finite-state transducers. Because of their efficiency and flexibility, regular relations and finite-state transducers are widely used in Natural Language Processing (NLP) for tasks such as grapheme-to-phoneme conversion, morphological analysis and generation, and shallow syntactic parsing. By exploiting logical representations for finite-state transductions, the technique advocated in this thesis combines efficient processing with the advantages of declarative specification, thus taking a step in the direction of providing finite-state NLP with the best of both worlds. Previous work has demonstrated how all sets of strings recognized by finite-state automata can be described in monadic second-order logic. A formula of this logic describing a set can be automatically compiled into the finite-state automaton recognizing that set. Although this compilation is extremely inefficient in the theoretical worst case, recent work has shown that it can function quite efficiently for a range of practical applications. The logical technique for describing sets of strings unfortunately does not carry over to relations on strings without further restrictions, since the class of regular relations lacks certain crucial closure properties. In this thesis we introduce the logical language MSO(SLR), a language for same-length relations, a proper subset of the regular relations which has the necessary closure properties. We discuss how a formula of MSO(SLR) describing a relation can be automatically compiled into the finite-state transducer implementing that relation. Although there are many regular relations which MSO(SLR) cannot describe directly, we show how MSO(SLR) can characterize such relations indirectly by describing aligned representations of them. To demonstrate the usefulness of MSO(SLR), we use it to define the finite-state conditional replace operator &phis; → ψ/λ_ρ in a declarative fashion. Since the formula-to-transducer compilation is implemented for MSO(SLR), our definition provides a compiler for the replace operator. We argue that this approach improves on previous definitions in terms of clarity, maintainability, extensibility, and formal verifiability. We justify these claims by discussing several extensions and variations of the operator and providing rigorous proofs of correctness for our definitions. A further demonstration of MSO(SLR)'s usefulness is given in the form of definitions of the rule formalisms used in two-level morphology. As with the replace operator definition, our declarative definitions give us a compiler automatically and make extensions and formal verification easy.