A trainable document summarizer
SIGIR '95 Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval
Assessing agreement on classification tasks: the kappa statistic
Computational Linguistics
Probabilistic independence networks for hidden Markov probability models
Neural Computation
The automatic construction of large-scale corpora for summarization research
Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
New Methods in Automatic Extracting
Journal of the ACM (JACM)
The Theory and Practice of Discourse Parsing and Summarization
The Theory and Practice of Discourse Parsing and Summarization
Advances in Automatic Text Summarization
Advances in Automatic Text Summarization
Summarization beyond sentence extraction: a probabilistic approach to sentence compression
Artificial Intelligence
Using hidden Markov modeling to decompose human-written summaries
Computational Linguistics - Summarization
A systematic comparison of various statistical alignment models
Computational Linguistics
The mathematics of statistical machine translation: parameter estimation
Computational Linguistics - Special issue on using large corpora: II
Sentence reduction for automatic text summarization
ANLC '00 Proceedings of the sixth conference on Applied natural language processing
HMM-based word alignment in statistical translation
COLING '96 Proceedings of the 16th conference on Computational linguistics - Volume 2
Motivations and methods for text simplification
COLING '96 Proceedings of the 16th conference on Computational linguistics - Volume 2
A syntax-based statistical translation model
ACL '01 Proceedings of the 39th Annual Meeting on Association for Computational Linguistics
A noisy-channel model for document compression
ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Loosely tree-based alignment for machine translation
ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1
Learning non-isomorphic tree mappings for machine translation
ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 2
Query-relevant summarization using FAQs
ACL '00 Proceedings of the 38th Annual Meeting on Association for Computational Linguistics
Headline generation based on statistical translation
ACL '00 Proceedings of the 38th Annual Meeting on Association for Computational Linguistics
Improved statistical alignment models
ACL '00 Proceedings of the 38th Annual Meeting on Association for Computational Linguistics
Sentence alignment for monolingual comparable corpora
EMNLP '03 Proceedings of the 2003 conference on Empirical methods in natural language processing
Statistical parsing with a context-free grammar and word statistics
AAAI'97/IAAI'97 Proceedings of the fourteenth national conference on artificial intelligence and ninth conference on Innovative applications of artificial intelligence
Seed and Grow: augmenting statistically generated summary sentences using schematic word patterns
EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Complementing RRL for dialogue summarisation
IBERAMIA'10 Proceedings of the 12th Ibero-American conference on Advances in artificial intelligence
Data-driven response generation in social media
EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Hi-index | 0.00 |
Current research in automatic single-document summarization is dominated by two effective, yet naïve approaches: summarization by sentence extraction and headline generation via bag-of-words models. While successful in some tasks, neither of these models is able to adequately capture the large set of linguistic devices utilized by humans when they produce summaries. One possible explanation for the widespread use of these models is that good techniques have been developed to extract appropriate training data for them from existing document/abstract and document/headline corpora. We believe that future progress in automatic summarization will be driven both by the development of more sophisticated, linguistically informed models, as well as a more effective leveraging of document/abstract corpora. In order to open the doors to simultaneously achieving both of these goals, we have developed techniques for automatically producing word-to-word and phrase-to-phrase alignments between documents and their human-written abstracts. These alignments make explicit the correspondences that exist in such document/abstract pairs and create a potentially rich data source from which complex summarization algorithms may learn. This paper describes experiments we have carried out to analyze the ability of humans to perform such alignments, and based on these analyses, we describe experiments for creating them automatically. Our model for the alignment task is based on an extension of the standard hidden Markov model and learns to create alignments in a completely unsupervised fashion. We describe our model in detail and present experimental results that show that our model is able to learn to reliably identify word- and phrase-level alignments in a corpus of document, abstract pairs.