Design and development of a concept-based multi-document summarization system for research abstracts

  • Authors:
  • Shiyan Ou;Christopher Soo-Guan Khoo;Dion H. Goh

  • Affiliations:
  • Division of Information Studies, School of Communicationand Information, Nanyang Technological University, Singapore;Division of Information Studies, School of Communicationand Information, Nanyang Technological University, Singapore;Division of Information Studies, School of Communicationand Information, Nanyang Technological University, Singapore

  • Venue:
  • Journal of Information Science
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper describes a new concept-based multi-documentsummarization system that employs discourse parsing, informationextraction and information integration. Dissertation abstracts inthe field of sociology were selected as sample documents for thisstudy. The summarization process includes four major steps —(1) parsing dissertation abstracts into five standard sections; (2)extracting research concepts (often operationalized as researchvariables) and their relationships, the research methods used andthe contextual relations from specific sections of the text; (3)integrating similar concepts and relationships across differentabstracts; and (4) combining and organizing the different kinds ofinformation using a variable-based framework, and presenting themin an interactive web-based interface. The accuracy of eachsummarization step was evaluated by comparing the system-generatedoutput against human coding. The user evaluation carried out in thestudy indicated that the majority of subjects (70%) preferred theconcept-based summaries generated using the system to thesentence-based summaries generated using traditional sentenceextraction techniques.