A Sentence Classification System for Multi Biomedical Literature Summarization

  • Authors:
  • Yasunori Yamamoto;Toshihisa Takagi

  • Affiliations:
  • University of Tokyo, Tokyo, Japan;Computational Biology,University of Tokyo, Tokyo, Japan

  • Venue:
  • ICDEW '05 Proceedings of the 21st International Conference on Data Engineering Workshops
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

A PubMed search often returns a long list of queryrelated papers that a researcher cannot cope with in a short time. As a first step to address this issue by summarizing retrieved papers, we developed a system to classify sentences of abstracts obtained from the MEDLINE database into five rhetorical statuses: background, purpose, method, result, or conclusion. We used Support Vector Machine (SVM) classifiers and trained each of them for a different rhetorical status on structured abstracts. A structured abstract is one that has labels indicating rhetorical statuses of the sentences, while an unstructured abstract does not. The classifiers were tested on both structured and unstructured abstracts. The former were randomly obtained from the MEDLINE database and the latter were manually labeled by humans. We compared our method with a previously reported one. In addition, we combined them and evaluated the combined method. Our method outperformed the previously reported one, and the combined method showed even better results. Classified abstracts can be used for multi-document summarization that provides researchers with a way of learning a research topic efficiently and effectively.