A semantic graph-based approach to biomedical summarisation

Authors:
Laura Plaza;Alberto Díaz;Pablo Gervás
Affiliations:
Departamento de Ingeniería del Software e Inteligencia Artificial, Universidad Complutense de Madrid, C/Profesor José García Santesmases, s/n, 28040 Madrid, Spain;Departamento de Ingeniería del Software e Inteligencia Artificial, Universidad Complutense de Madrid, C/Profesor José García Santesmases, s/n, 28040 Madrid, Spain;Departamento de Ingeniería del Software e Inteligencia Artificial, Universidad Complutense de Madrid, C/Profesor José García Santesmases, s/n, 28040 Madrid, Spain
Venue:
Artificial Intelligence in Medicine
Year:
2011

Citing 25
Cited 3

A trainable document summarizer

SIGIR '95 Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval
Automatic condensation of electronic publications by sentence selection

Information Processing and Management: an International Journal - Special issue: summarizing text
The anatomy of a large-scale hypertextual Web search engine

WWW7 Proceedings of the seventh international conference on World Wide Web 7
New Methods in Automatic Extracting

Journal of the ACM (JACM)
Authoritative sources in a hyperlinked environment

Journal of the ACM (JACM)
Advances in Automatic Text Summarization

Advances in Automatic Text Summarization
The interaction of domain knowledge and linguistic structure in natural language processing: interpreting hypernymic propositions in biomedical text

Journal of Biomedical Informatics - Special issue: Unified medical language system
Evaluation challenges in large-scale document summarization

ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1
Word sense disambiguation by selecting the best semantic type based on Journal Descriptor Indexing: Preliminary experiment

Journal of the American Society for Information Science and Technology
Improving statistical machine translation in the medical domain using the unified medical language system

COLING '04 Proceedings of the 20th international conference on Computational Linguistics
The use of domain-specific concepts in biomedical text summarization

Information Processing and Management: an International Journal
Beyond SumBasic: Task-focused summarization with sentence simplification and lexical expansion

Information Processing and Management: an International Journal
Syntactic sentence compression in the biomedical domain: facilitating access to related articles

Information Retrieval
Summarization system evaluation revisited: N-gram graphs

ACM Transactions on Speech and Language Processing (TSLP)
Question Answering Summarization of Multiple Biomedical Documents

CAI '07 Proceedings of the 20th conference of the Canadian Society for Computational Studies of Intelligence on Advances in Artificial Intelligence
Abstraction summarization for managing the biomedical research literature

CLS '04 Proceedings of the HLT-NAACL Workshop on Computational Lexical Semantics
Graph-based keyword extraction for single-document summarization

MMIES '08 Proceedings of the Workshop on Multi-source Multilingual Information Extraction and Summarization
Revisiting readability: a unified framework for predicting text quality

EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Towards effective sentence simplification for automatic processing of biomedical text

NAACL-Short '09 Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Companion Volume: Short Papers
LexRank: graph-based lexical centrality as salience in text summarization

Journal of Artificial Intelligence Research
Summarization from medical documents: a survey

Artificial Intelligence in Medicine
The automatic creation of literature abstracts

IBM Journal of Research and Development
Improving automatic image captioning using text summarization techniques

TSD'10 Proceedings of the 13th international conference on Text, speech and dialogue
Retrieval of similar electronic health records using UMLS concept graphs

NLDB'10 Proceedings of the Natural language processing and information systems, and 15th international conference on Applications of natural language to information systems
Quantitative evaluation of grammaticality of summaries

CICLing'10 Proceedings of the 11th international conference on Computational Linguistics and Intelligent Text Processing

Collaboration-based medical knowledge recommendation

Artificial Intelligence in Medicine
A genetic graph-based clustering approach to biomedical summarization

Proceedings of the 3rd International Conference on Web Intelligence, Mining and Semantics
An approach for Ewing test selection to support the clinical assessment of cardiac autonomic neuropathy

Artificial Intelligence in Medicine

Quantified Score

Hi-index	0.03

Visualization

Abstract

Objective: Access to the vast body of research literature that is available in biomedicine and related fields may be improved by automatic summarisation. This paper presents a method for summarising biomedical scientific literature that takes into consideration the characteristics of the domain and the type of documents. Methods: To address the problem of identifying salient sentences in biomedical texts, concepts and relations derived from the Unified Medical Language System (UMLS) are arranged to construct a semantic graph that represents the document. A degree-based clustering algorithm is then used to identify different themes or topics within the text. Different heuristics for sentence selection, intended to generate different types of summaries, are tested. A real document case is drawn up to illustrate how the method works. Results: A large-scale evaluation is performed using the recall-oriented understudy for gisting-evaluation (ROUGE) metrics. The results are compared with those achieved by three well-known summarisers (two research prototypes and a commercial application) and two baselines. Our method significantly outperforms all summarisers and baselines. The best of our heuristics achieves an improvement in performance of almost 7.7 percentage units in the ROUGE-1 score over the LexRank summariser (0.7862 versus 0.7302). A qualitative analysis of the summaries also shows that our method succeeds in identifying sentences that cover the main topic of the document and also considers other secondary or ''satellite'' information that might be relevant to the user. Conclusion: The method proposed is proved to be an efficient approach to biomedical literature summarisation, which confirms that the use of concepts rather than terms can be very useful in automatic summarisation, especially when dealing with highly specialised domains.