Parsing idioms in lexicalized TAGs

Authors:
Anne Abeillé;Yves Schabes
Affiliations:
University Paris, Paris, France;University of Pennsylvania, Philadelphia, PA
Venue:
EACL '89 Proceedings of the fourth conference on European chapter of the Association for Computational Linguistics
Year:
1989

Citing 8
Cited 18

Discontinuous constituent in trees, rules, and parsing

EACL '87 Proceedings of the third conference on European chapter of the Association for Computational Linguistics
Getting idioms into a lexicon based parser's head

ACL '87 Proceedings of the 25th annual meeting on Association for Computational Linguistics
Some computational properties of Tree Adjoining Grammars

ACL '85 Proceedings of the 23rd annual meeting on Association for Computational Linguistics
Parsing with discontinuous constituents

ACL '85 Proceedings of the 23rd annual meeting on Association for Computational Linguistics
An Earley-type parsing algorithm for Tree Adjoining Grammars

ACL '88 Proceedings of the 26th annual meeting on Association for Computational Linguistics
Parsing French with Tree Adjoining Grammar: some linguistic accounts

COLING '88 Proceedings of the 12th conference on Computational linguistics - Volume 1
Parsing strategies with 'lexicalized' grammars: application to tree adjoining grammars

COLING '88 Proceedings of the 12th conference on Computational linguistics - Volume 2
Feature structures based Tree Adjoining Grammars

COLING '88 Proceedings of the 12th conference on Computational linguistics - Volume 2

Two recent developments in tree adjoining grammars: semantics and efficient processing

HLT '90 Proceedings of the workshop on Speech and Natural Language
Incremental processing and the hierarchical lexicon

Computational Linguistics - Special issue on inheritance: I
Parsing and Collocations

NLP '00 Proceedings of the Second International Conference on Natural Language Processing
Recycling terms into a partial parser

ANLC '94 Proceedings of the fourth conference on Applied natural language processing
Translating idioms

COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 2
Optimizing the computational lexicalization of large grammars

ACL '94 Proceedings of the 32nd annual meeting on Association for Computational Linguistics
Lexical and syntactic rules in a Tree Adjoining Grammar

ACL '90 Proceedings of the 28th annual meeting on Association for Computational Linguistics
Using lexicalized tags for machine translation

COLING '90 Proceedings of the 13th conference on Computational linguistics - Volume 3
Synchronous tree-adjoining grammars

COLING '90 Proceedings of the 13th conference on Computational linguistics - Volume 3
Lexical functions and machine translation

COLING '94 Proceedings of the 15th conference on Computational linguistics - Volume 2
Probabilistic tree-adjoining grammar as a framework for statistical natural language processing

COLING '92 Proceedings of the 14th conference on Computational linguistics - Volume 2
Formal description of multi-word lexemes with the finite-state formalism IDAREX

COLING '96 Proceedings of the 16th conference on Computational linguistics - Volume 2
Lexical gaps and idioms in machine translation

COLING '90 Proceedings of the 13th conference on Computational linguistics - Volume 2
Generating with a grammar based on tree descriptions: a constraint-based approach

ACL '01 Proceedings of the 39th Annual Meeting on Association for Computational Linguistics
An evaluation of lexicalization in parsing

HLT '89 Proceedings of the workshop on Speech and Natural Language
A computational grammar for Persian based on GPSG

Language Resources and Evaluation
Multiword expression identification with tree substitution grammars: a parsing tour de force with French

EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Parsing models for identifying multiword expressions

Computational Linguistics

Quantified Score

Hi-index	0.00

Visualization

Abstract

We show how idioms can be parsed in lexicalized TAGs. We rely on extensive studies of frozen phrases pursued at L.A.D.L. that show that idioms are pervasive in natural language and obey, generally speaking, the same morphological and syntactical patterns as 'free' structures. By idiom we mean a structure in which some items are lexically frozen and have a semantics that is not compositional. We thus consider idioms of different syntactic categories: NP, S, adverbials, compound prepositions... in both English and French.In lexicalized TAGs, the same grammar is used for idioms as for 'free' sentences. We assign them regular syntactic structures while representing them semantically as one non-compositional entry. Syntactic transformations and insertion of modifiers may thus apply to them as to any 'free' structures. Unlike previous approaches, their variability becomes the general case and their being totally frozen the exception. Idioms are generally represented by extended elementary trees with 'heads' made out of several items (that need not be contiguous) with one of the items serving as an index. When an idiomatic tree is selected by this index, lexical items are attached to some nodes in the tree. Idiomatic trees are selected by a single head node however the head value imposes lexical values on other nodes in the tree. This operation of attaching the head item of an idiom and its lexical parts is called lexical attachment. The resulting tree has the lexical items corresponding to the pieces of the idiom already attached to it.We generalize the parsing strategy defined for lexicalized TAG to the case of 'heads' made out of several items. We propose to parse idioms in two steps which are merged in the two steps parsing strategy that is defined for 'free' sentences. The first step performed during the lexical pass selects trees corresponding to the literal and idiomatic interpretation. However it is not always the case that the idiomatic trees are selected as possible candidates. We require that all basic pieces building the minimal idiomatic expression must be present in the input string (with possibly some order constraints). This condition is a necessary condition for the idiomatic reading but of course it is not sufficient. The second step performs the syntax analysis as in the usual case. During the second step, idiomatic reading might be rejected. Idioms are thus parsed as any 'free' sentences. Except during the selection process, idioms do not require any special parsing mechanism. We are also able to account for cases of ambiguity between idiomatic and literal interpretations.Factoring recursion from dependencies in TAGs allows discontinuous constituents to be parsed in an elegant way. We also show how regular 'transformations' are taken into account by the parser.