Induction of Word and Phrase Alignments for Automatic Document Summarization

Authors:
Hal Daumé, III;Daniel Marcu
Affiliations:
-;-
Venue:
Computational Linguistics
Year:
2005

Citing 23
Cited 3

A trainable document summarizer

SIGIR '95 Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval
Assessing agreement on classification tasks: the kappa statistic

Computational Linguistics
Probabilistic independence networks for hidden Markov probability models

Neural Computation
The automatic construction of large-scale corpora for summarization research

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
New Methods in Automatic Extracting

Journal of the ACM (JACM)
The Theory and Practice of Discourse Parsing and Summarization

The Theory and Practice of Discourse Parsing and Summarization
Advances in Automatic Text Summarization

Advances in Automatic Text Summarization
Summarization beyond sentence extraction: a probabilistic approach to sentence compression

Artificial Intelligence
Using hidden Markov modeling to decompose human-written summaries

Computational Linguistics - Summarization
A systematic comparison of various statistical alignment models

Computational Linguistics
The mathematics of statistical machine translation: parameter estimation

Computational Linguistics - Special issue on using large corpora: II
Sentence reduction for automatic text summarization

ANLC '00 Proceedings of the sixth conference on Applied natural language processing
HMM-based word alignment in statistical translation

COLING '96 Proceedings of the 16th conference on Computational linguistics - Volume 2
Motivations and methods for text simplification

COLING '96 Proceedings of the 16th conference on Computational linguistics - Volume 2
A syntax-based statistical translation model

ACL '01 Proceedings of the 39th Annual Meeting on Association for Computational Linguistics
A noisy-channel model for document compression

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Loosely tree-based alignment for machine translation

ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1
Learning non-isomorphic tree mappings for machine translation

ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 2
Query-relevant summarization using FAQs

ACL '00 Proceedings of the 38th Annual Meeting on Association for Computational Linguistics
Headline generation based on statistical translation

ACL '00 Proceedings of the 38th Annual Meeting on Association for Computational Linguistics
Improved statistical alignment models

ACL '00 Proceedings of the 38th Annual Meeting on Association for Computational Linguistics
Sentence alignment for monolingual comparable corpora

EMNLP '03 Proceedings of the 2003 conference on Empirical methods in natural language processing
Statistical parsing with a context-free grammar and word statistics

AAAI'97/IAAI'97 Proceedings of the fourteenth national conference on artificial intelligence and ninth conference on Innovative applications of artificial intelligence

Seed and Grow: augmenting statistically generated summary sentences using schematic word patterns

EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Complementing RRL for dialogue summarisation

IBERAMIA'10 Proceedings of the 12th Ibero-American conference on Advances in artificial intelligence
Data-driven response generation in social media

EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Current research in automatic single-document summarization is dominated by two effective, yet naïve approaches: summarization by sentence extraction and headline generation via bag-of-words models. While successful in some tasks, neither of these models is able to adequately capture the large set of linguistic devices utilized by humans when they produce summaries. One possible explanation for the widespread use of these models is that good techniques have been developed to extract appropriate training data for them from existing document/abstract and document/headline corpora. We believe that future progress in automatic summarization will be driven both by the development of more sophisticated, linguistically informed models, as well as a more effective leveraging of document/abstract corpora. In order to open the doors to simultaneously achieving both of these goals, we have developed techniques for automatically producing word-to-word and phrase-to-phrase alignments between documents and their human-written abstracts. These alignments make explicit the correspondences that exist in such document/abstract pairs and create a potentially rich data source from which complex summarization algorithms may learn. This paper describes experiments we have carried out to analyze the ability of humans to perform such alignments, and based on these analyses, we describe experiments for creating them automatically. Our model for the alignment task is based on an extension of the standard hidden Markov model and learns to create alignments in a completely unsupervised fashion. We describe our model in detail and present experimental results that show that our model is able to learn to reliably identify word- and phrase-level alignments in a corpus of document, abstract pairs.