Design and development of a concept-based multi-document summarization system for research abstracts

Authors:
Shiyan Ou;Christopher Soo-Guan Khoo;Dion H. Goh
Affiliations:
Division of Information Studies, School of Communicationand Information, Nanyang Technological University, Singapore;Division of Information Studies, School of Communicationand Information, Nanyang Technological University, Singapore;Division of Information Studies, School of Communicationand Information, Nanyang Technological University, Singapore
Venue:
Journal of Information Science
Year:
2008

Citing 20
Cited 5

Constructing literature abstracts by computer: techniques and prospects

Information Processing and Management: an International Journal - Special issue on natural language processing and information retrieval
A trainable document summarizer

SIGIR '95 Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval
Generating summaries of multiple news articles

SIGIR '95 Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval
The use of MMR, diversity-based reranking for reordering documents and producing summaries

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Fast and effective text mining using linear-time document clustering

KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Summarizing text documents: sentence selection and evaluation metrics

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
New Methods in Automatic Extracting

Journal of the ACM (JACM)
Cross-document summarization by concept classification

SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
Towards CST-enhanced summarization

Eighteenth national conference on Artificial intelligence
A non-projective dependency parser

ANLC '97 Proceedings of the fifth conference on Applied natural language processing
Term extraction + term clustering: an integrated platform for computer-aided terminology

EACL '99 Proceedings of the ninth conference on European chapter of the Association for Computational Linguistics
Multi-paragraph segmentation of expository text

ACL '94 Proceedings of the 32nd annual meeting on Association for Computational Linguistics
Automatic detection of discourse structure by checking surface information in sentences

COLING '94 Proceedings of the 15th conference on Computational linguistics - Volume 2
Centroid-based summarization of multiple documents: sentence extraction, utility-based evaluation, and user studies

NAACL-ANLP-AutoSum '00 Proceedings of the 2000 NAACL-ANLPWorkshop on Automatic summarization - Volume 4
Multi-document summarization by visualizing topical content

NAACL-ANLP-AutoSum '00 Proceedings of the 2000 NAACL-ANLPWorkshop on Automatic summarization - Volume 4
Automatic multidocument summarization of research abstracts: Design and user evaluation

Journal of the American Society for Information Science and Technology
Integrating co-occurrence statistics with information extraction for robust retrieval of protein interactions from Medline

BioNLP '06 Proceedings of the Workshop on Linking Natural Language Processing and Biology: Towards Deeper Biological Literature Analysis
Using coreference chains for text summarization

CorefApp '99 Proceedings of the Workshop on Coreference and its Applications
Summarization from medical documents: a survey

Artificial Intelligence in Medicine
A study to improve the efficiency of a discourse parsing system

CICLing'03 Proceedings of the 4th international conference on Computational linguistics and intelligent text processing

Imitating human literature review writing: an approach to multi-document summarization

ICADL'10 Proceedings of the role of digital libraries in a time of global change, and 12th international conference on Asia-Pacific digital libraries
A novel approach for research paper abstracts summarization using cluster based sentence extraction

Proceedings of the International Conference & Workshop on Emerging Trends in Technology
Multi-document summarization of scientific corpora

Proceedings of the 2011 ACM Symposium on Applied Computing
A formal concept analysis-based domain-specific thesaurus and its application in document representation

ICCSA'10 Proceedings of the 2010 international conference on Computational Science and Its Applications - Volume Part III
Correlation based multi-document summarization for scientific articles and news group

Proceedings of the International Conference on Advances in Computing, Communications and Informatics

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper describes a new concept-based multi-documentsummarization system that employs discourse parsing, informationextraction and information integration. Dissertation abstracts inthe field of sociology were selected as sample documents for thisstudy. The summarization process includes four major steps —(1) parsing dissertation abstracts into five standard sections; (2)extracting research concepts (often operationalized as researchvariables) and their relationships, the research methods used andthe contextual relations from specific sections of the text; (3)integrating similar concepts and relationships across differentabstracts; and (4) combining and organizing the different kinds ofinformation using a variable-based framework, and presenting themin an interactive web-based interface. The accuracy of eachsummarization step was evaluated by comparing the system-generatedoutput against human coding. The user evaluation carried out in thestudy indicated that the majority of subjects (70%) preferred theconcept-based summaries generated using the system to thesentence-based summaries generated using traditional sentenceextraction techniques.