Parsing idioms in lexicalized TAGs

  • Authors:
  • Anne Abeillé;Yves Schabes

  • Affiliations:
  • University Paris, Paris, France;University of Pennsylvania, Philadelphia, PA

  • Venue:
  • EACL '89 Proceedings of the fourth conference on European chapter of the Association for Computational Linguistics
  • Year:
  • 1989

Quantified Score

Hi-index 0.00

Visualization

Abstract

We show how idioms can be parsed in lexicalized TAGs. We rely on extensive studies of frozen phrases pursued at L.A.D.L. that show that idioms are pervasive in natural language and obey, generally speaking, the same morphological and syntactical patterns as 'free' structures. By idiom we mean a structure in which some items are lexically frozen and have a semantics that is not compositional. We thus consider idioms of different syntactic categories: NP, S, adverbials, compound prepositions... in both English and French.In lexicalized TAGs, the same grammar is used for idioms as for 'free' sentences. We assign them regular syntactic structures while representing them semantically as one non-compositional entry. Syntactic transformations and insertion of modifiers may thus apply to them as to any 'free' structures. Unlike previous approaches, their variability becomes the general case and their being totally frozen the exception. Idioms are generally represented by extended elementary trees with 'heads' made out of several items (that need not be contiguous) with one of the items serving as an index. When an idiomatic tree is selected by this index, lexical items are attached to some nodes in the tree. Idiomatic trees are selected by a single head node however the head value imposes lexical values on other nodes in the tree. This operation of attaching the head item of an idiom and its lexical parts is called lexical attachment. The resulting tree has the lexical items corresponding to the pieces of the idiom already attached to it.We generalize the parsing strategy defined for lexicalized TAG to the case of 'heads' made out of several items. We propose to parse idioms in two steps which are merged in the two steps parsing strategy that is defined for 'free' sentences. The first step performed during the lexical pass selects trees corresponding to the literal and idiomatic interpretation. However it is not always the case that the idiomatic trees are selected as possible candidates. We require that all basic pieces building the minimal idiomatic expression must be present in the input string (with possibly some order constraints). This condition is a necessary condition for the idiomatic reading but of course it is not sufficient. The second step performs the syntax analysis as in the usual case. During the second step, idiomatic reading might be rejected. Idioms are thus parsed as any 'free' sentences. Except during the selection process, idioms do not require any special parsing mechanism. We are also able to account for cases of ambiguity between idiomatic and literal interpretations.Factoring recursion from dependencies in TAGs allows discontinuous constituents to be parsed in an elegant way. We also show how regular 'transformations' are taken into account by the parser.