Tree linearization in English: improving language model based approaches

Authors:
Katja Filippova;Michael Strube
Affiliations:
EML Research gGmbH, Heidelberg, Germany;EML Research gGmbH, Heidelberg, Germany
Venue:
NAACL-Short '09 Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Companion Volume: Short Papers
Year:
2009

Citing 6
Cited 11

Generation that exploits corpus-based statistical knowledge

COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 1
Word order acquisition from corpora

COLING '00 Proceedings of the 18th conference on Computational linguistics - Volume 2
Accurate unlexicalized parsing

ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1
Speech and Language Processing (2nd Edition)

Speech and Language Processing (2nd Edition)
Linguistically informed statistical models of constituent structure for ordering in sentence realization

COLING '04 Proceedings of the 20th international conference on Computational Linguistics
Automatic Evaluation of Information Ordering: Kendall's Tau

Computational Linguistics

Perceptron reranking for CCG realization

EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 1 - Volume 1
Broad coverage multilingual deep sentence generation with a stochastic multi-level realizer

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics
On the limits of sentence compression by deletion

Empirical methods in natural language generation
Underspecifying and predicting voice for surface realisation ranking

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
Towards strict sentence intersection: decoding and evaluation strategies

MTTG '11 Proceedings of the Workshop on Monolingual Text-To-Text Generation
Learning to fuse disparate sentences

MTTG '11 Proceedings of the Workshop on Monolingual Text-To-Text Generation
Dependency-based n-gram models for general purpose sentence realisation

Natural Language Engineering
Syntax-based word ordering incorporating a large-scale language model

EACL '12 Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics
To what extent does sentence-internal realisation reflect discourse context?: a study on word order

EACL '12 Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics
Minimal dependency length in realization ranking

EMNLP-CoNLL '12 Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning
Generating non-projective word order in statistical linearization

EMNLP-CoNLL '12 Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning

Quantified Score

Hi-index	0.00

Visualization

Abstract

We compare two approaches to dependency tree linearization, a task which arises in many NLP applications. The first one is the widely used 'overgenerate and rank' approach which relies exclusively on a trigram language model (LM); the second one combines language modeling with a maximum entropy classifier trained on a range of linguistic features. The results provide strong support for the combined method and show that trigram LMs are appropriate for phrase linearization while on the clause level a richer representation is necessary to achieve comparable performance.