A trainable document summarizer
SIGIR '95 Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval
Foundations of statistical natural language processing
Foundations of statistical natural language processing
A hidden Markov model information retrieval system
Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Information retrieval as statistical translation
Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Summarizing Similarities and Differences Among Related Documents
Information Retrieval
A new approach to unsupervised text summarization
Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
Statistics-Based Summarization - Step One: Sentence Compression
Proceedings of the Seventeenth National Conference on Artificial Intelligence and Twelfth Conference on Innovative Applications of Artificial Intelligence
Tagging English text with a probabilistic model
Computational Linguistics
A stochastic parts program and noun phrase parser for unrestricted text
ANLC '88 Proceedings of the second conference on Applied natural language processing
Distribution of content words and phrases in text and language modelling
Natural Language Engineering
Similarity-based methods for word sense disambiguation
ACL '98 Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and Eighth Conference of the European Chapter of the Association for Computational Linguistics
The rhetorical parsing of natural language texts
ACL '98 Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and Eighth Conference of the European Chapter of the Association for Computational Linguistics
Multi-paragraph segmentation of expository text
ACL '94 Proceedings of the 32nd annual meeting on Association for Computational Linguistics
Fast generation of abstracts from general domain text corpora by extracting relevant sentences
COLING '96 Proceedings of the 16th conference on Computational linguistics - Volume 2
Sentence ordering in multidocument summarization
HLT '01 Proceedings of the first international conference on Human language technology research
Supervised ranking in open-domain text summarization
ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Headline generation based on statistical translation
ACL '00 Proceedings of the 38th Annual Meeting on Association for Computational Linguistics
NAACL-ANLP-AutoSum '00 Proceedings of the 2000 NAACL-ANLPWorkshop on Automatic summarization - Volume 4
Using maximum entropy for sentence extraction
AS '02 Proceedings of the ACL-02 Workshop on Automatic Summarization - Volume 4
Selecting sentences for multidocument summaries using randomized local search
AS '02 Proceedings of the ACL-02 Workshop on Automatic Summarization - Volume 4
Revisions that improve cohesion in multi-document summaries: a preliminary study
AS '02 Proceedings of the ACL-02 Workshop on Automatic Summarization - Volume 4
One story, one flow: Hidden Markov Story Models for multilingual multidocument summarization
ACM Transactions on Speech and Language Processing (TSLP)
Extraction of coherent relevant passages using hidden Markov models
ACM Transactions on Information Systems (TOIS)
Personalized text snippet extraction using statistical language models
Pattern Recognition
A rhetorical syntax-driven model for speech summarization
COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics
HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Tutorial Abstracts of ACL 2011
Multiple documents summarization based on genetic algorithm
FSKD'06 Proceedings of the Third international conference on Fuzzy Systems and Knowledge Discovery
Hi-index | 0.00 |
We propose Hidden Markov models with unsupervised training for extractive summarization. Extractive summarization selects salient sentences from documents to be included in a summary. Unsupervised clustering combined with heuristics is a popular approach because no annotated data is required. However, conventional clustering methods such as K-means do not take text cohesion into consideration. Probabilistic methods are more rigorous and robust, but they usually require supervised training with annotated data. Our method incorporates unsupervised training with clustering, into a probabilistic framework. Clustering is done by modified K-means (MKM)---a method that yields more optimal clusters than the conventional K-means method. Text cohesion is modeled by the transition probabilities of an HMM, and term distribution is modeled by the emission probabilities. The final decoding process tags sentences in a text with theme class labels. Parameter training is carried out by the segmental K-means (SKM) algorithm. The output of our system can be used to extract salient sentences for summaries, or used for topic detection. Content-based evaluation shows that our method outperforms an existing extractive summarizer by 22.8% in terms of relative similarity, and outperforms a baseline summarizer that selects the top N sentences as salient sentences by 46.3%.