Logical specification of finite-state transductions for natural language processing

  • Authors:
  • Chris Brew;Nathan Vaillette

  • Affiliations:
  • -;-

  • Venue:
  • Logical specification of finite-state transductions for natural language processing
  • Year:
  • 2004

Quantified Score

Hi-index 0.00

Visualization

Abstract

This thesis is concerned with the use of a logical language for specifying mappings between strings of symbols; specifically, the regular relations , those which can be computed by finite-state transducers. Because of their efficiency and flexibility, regular relations and finite-state transducers are widely used in Natural Language Processing (NLP) for tasks such as grapheme-to-phoneme conversion, morphological analysis and generation, and shallow syntactic parsing. By exploiting logical representations for finite-state transductions, the technique advocated in this thesis combines efficient processing with the advantages of declarative specification, thus taking a step in the direction of providing finite-state NLP with the best of both worlds. Previous work has demonstrated how all sets of strings recognized by finite-state automata can be described in monadic second-order logic. A formula of this logic describing a set can be automatically compiled into the finite-state automaton recognizing that set. Although this compilation is extremely inefficient in the theoretical worst case, recent work has shown that it can function quite efficiently for a range of practical applications. The logical technique for describing sets of strings unfortunately does not carry over to relations on strings without further restrictions, since the class of regular relations lacks certain crucial closure properties. In this thesis we introduce the logical language MSO(SLR), a language for same-length relations, a proper subset of the regular relations which has the necessary closure properties. We discuss how a formula of MSO(SLR) describing a relation can be automatically compiled into the finite-state transducer implementing that relation. Although there are many regular relations which MSO(SLR) cannot describe directly, we show how MSO(SLR) can characterize such relations indirectly by describing aligned representations of them. To demonstrate the usefulness of MSO(SLR), we use it to define the finite-state conditional replace operator &phis; → ψ/λ_ρ in a declarative fashion. Since the formula-to-transducer compilation is implemented for MSO(SLR), our definition provides a compiler for the replace operator. We argue that this approach improves on previous definitions in terms of clarity, maintainability, extensibility, and formal verifiability. We justify these claims by discussing several extensions and variations of the operator and providing rigorous proofs of correctness for our definitions. A further demonstration of MSO(SLR)'s usefulness is given in the form of definitions of the rule formalisms used in two-level morphology. As with the replace operator definition, our declarative definitions give us a compiler automatically and make extensions and formal verification easy.