The identification of important concepts in highly structured technical papers
SIGIR '93 Proceedings of the 16th annual international ACM SIGIR conference on Research and development in information retrieval
A trainable document summarizer
SIGIR '95 Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval
Query expansion using local and global document analysis
SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval
Machine learning of generic and user-focused summarization
AAAI '98/IAAI '98 Proceedings of the fifteenth national/tenth conference on Artificial intelligence/Innovative applications of artificial intelligence
Summarizing text documents: sentence selection and evaluation metrics
Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
The automatic construction of large-scale corpora for summarization research
Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Extracting sentence segments for text summarization: a machine learning approach
SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
The use of unlabeled data to improve supervised learning for text summarization
SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
An efficient boosting algorithm for combining preferences
The Journal of Machine Learning Research
Fast generation of abstracts from general domain text corpora by extracting relevant sentences
COLING '96 Proceedings of the 16th conference on Computational linguistics - Volume 2
Ranking algorithms for named-entity extraction: boosting and the voted perceptron
ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Ranking and Reranking with Perceptron
Machine Learning
The automatic creation of literature abstracts
IBM Journal of Research and Development
Learning-based summarisation of XML documents
Information Retrieval
Extractive spoken document summarization for information retrieval
Pattern Recognition Letters
LIP6 at INEX'09: OWPC for ad hoc track
INEX'09 Proceedings of the Focused retrieval and evaluation, and 8th international conference on Initiative for the evaluation of XML retrieval
Applying regression models to query-focused multi-document summarization
Information Processing and Management: an International Journal
LIP6 at INEX'10: OWPC for ad hoc track
INEX'10 Proceedings of the 9th international conference on Initiative for the evaluation of XML retrieval: comparative evaluation of focused retrieval
Machine learning ranking and INEX’05
INEX'05 Proceedings of the 4th international conference on Initiative for the Evaluation of XML Retrieval
A computer-assisted qualitative data analysis framework for the engineering management domain
International Journal of Data Analysis Techniques and Strategies
Machine learning ranking for structured information retrieval
ECIR'06 Proceedings of the 28th European conference on Advances in Information Retrieval
NOMIT: automatic titling by nominalizing
NAACL HLT '12 Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
How can catchy titles be generated without loss of informativeness?
Expert Systems with Applications: An International Journal
Hi-index | 0.00 |
This paper investigates a new approach for Single Document Summarization based on a Machine Learning ranking algorithm. The use of machine learning techniques for this task allows one to adapt summaries to the user needs and to the corpus characteristics. These desirable properties have motivated an increasing amount of work in this field over the last few years. Most approaches attempt to generate summaries by extracting text-spans (sentences in our case) and adopt the classification framework which consists to train a classifier in order to discriminate between relevant and irrelevant spans of a document. A set of features is first used to produce a vector of scores for each sentence in a given document and a classifier is trained in order to make a global combination of these scores. We believe that the classification criterion for training a classifier is not adapted for SDS and propose an original framework based on ranking for this task. A ranking algorithm also combines the scores of different features but its criterion tends to reduce the relative misordering of sentences within a document. Features we use here are either based on the state-of-the-art or built upon word-clusters. These clusters are groups of words which often co-occur with each other, and can serve to expand a query or to enrich the representation of the sentences of the documents. We analyze the performance of our ranking algorithm on two data sets – the Computation and Language (cmp_lg) collection of TIPSTER SUMMAC and the WIPO collection. We perform comparisons with different baseline – non learning – systems, and a reference trainable summarizer system based on the classification framework. The experiments show that the learning algorithms perform better than the non-learning systems while the ranking algorithm outperforms the classifier. The difference of performance between the two learning algorithms depends on the nature of datasets. We give an explanation of this fact by the different separability hypothesis of the data made by the two learning algorithms.