A trainable document summarizer
SIGIR '95 Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval
Natural-language retrieval of images based on descriptive captions
ACM Transactions on Information Systems (TOIS)
Advantages of query biased summaries in information retrieval
Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
The use of MMR, diversity-based reranking for reordering documents and producing summaries
Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Summarizing text documents: sentence selection and evaluation metrics
Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Advances in Automatic Text Summarization
Advances in Automatic Text Summarization
Training Support Vector Machines: an Application to Face Detection
CVPR '97 Proceedings of the 1997 Conference on Computer Vision and Pattern Recognition (CVPR '97)
Learning with progressive transductive support vector machine
Pattern Recognition Letters
A task-oriented study on the influencing effects of query-biased summarisation in web searching
Information Processing and Management: an International Journal
Probability Estimates for Multi-class Classification by Pairwise Coupling
The Journal of Machine Learning Research
Complex spatio-temporal pattern queries
VLDB '05 Proceedings of the 31st international conference on Very large data bases
Associating Text and Graphics for Scientific Chart Understanding
ICDAR '05 Proceedings of the Eighth International Conference on Document Analysis and Recognition
Meta-data indexing for XPath location steps
Proceedings of the 2006 ACM SIGMOD international conference on Management of data
Pattern Recognition and Machine Learning (Information Science and Statistics)
Pattern Recognition and Machine Learning (Information Science and Statistics)
ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
TableSeer: automatic table metadata extraction and searching in digital libraries
Proceedings of the 7th ACM/IEEE-CS joint conference on Digital libraries
Bioinformatics
Journal of the American Society for Information Science and Technology
Introduction to Information Retrieval
Introduction to Information Retrieval
Multi-document summarization by sentence extraction
NAACL-ANLP-AutoSum '00 Proceedings of the 2000 NAACL-ANLP Workshop on Automatic Summarization
Automatic extraction of data points and text blocks from 2-dimensional plots in digital documents
AAAI'08 Proceedings of the 23rd national conference on Artificial intelligence - Volume 2
Generating synopses for document-element search
Proceedings of the 18th ACM conference on Information and knowledge management
The automatic creation of literature abstracts
IBM Journal of Research and Development
Finding algorithms in scientific articles
Proceedings of the 19th international conference on World wide web
LIBSVM: A library for support vector machines
ACM Transactions on Intelligent Systems and Technology (TIST)
Improving algorithm search using the algorithm co-citation network
Proceedings of the 12th ACM/IEEE-CS joint conference on Digital Libraries
Hi-index | 0.00 |
Increasingly, special-purpose search engines are being built to enable the retrieval of document-elements like tables, figures, and algorithms [Bhatia et al. 2010; Liu et al. 2007; Hearst et al. 2007]. These search engines present a thumbnail view of document-elements, some document metadata such as the title of the papers and their authors, and the caption of the document-element. While some authors in some disciplines write carefully tailored captions, generally, the author of a document assumes that the caption will be read in the context of the text in the document. When the caption is presented out of context as in a document-element-search-engine result, it may not contain enough information to help the end-user understand what the content of the document-element is. Consequently, end-users examining document-element search results would want a short “synopsis” of this information presented along with the document-element. Having access to the synopsis allows the end-user to quickly understand the content of the document-element without having to download and read the entire document as examining the synopsis takes a shorter time than finding information about a document element by downloading, opening and reading the file. Furthermore, it may allow the end-user to examine more results than they would otherwise. In this paper, we present the first set of methods to extract this useful information (synopsis) related to document-elements automatically. We use Naïve Bayes and support vector machine classifiers to identify relevant sentences from the document text based on the similarity and the proximity of the sentences with the caption and the sentences in the document text that refer to the document-element. We compare the two classification methods and study the effects of different features used. We also investigate the problem of choosing the optimum synopsis-size that strikes a balance between the information content and the size of the generated synopses. A user study is also performed to measure how the synopses generated by our proposed method compare with other state-of-the-art approaches.