A language independent approach to multilingual text summarization

Authors:
Alkesh Patel;Tanveer Siddiqui;U. S. Tiwary
Affiliations:
Indian Institute of Information Technology, Allahabad;Indian Institute of Information Technology, Allahabad;Indian Institute of Information Technology, Allahabad
Venue:
Large Scale Semantic Access to Content (Text, Image, Video, and Sound)
Year:
2007

Citing 9
Cited 0

A trainable document summarizer

SIGIR '95 Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval
Automatic condensation of electronic publications by sentence selection

Information Processing and Management: an International Journal - Special issue: summarizing text
Machine learning of generic and user-focused summarization

AAAI '98/IAAI '98 Proceedings of the fifteenth national/tenth conference on Artificial intelligence/Innovative applications of artificial intelligence
New Methods in Automatic Extracting

Journal of the ACM (JACM)
Automated text summarization and the SUMMARIST system

TIPSTER '98 Proceedings of a workshop on held at Baltimore, Maryland: October 13-15, 1998
A survey for multi-document summarization

HLT-NAACL-DUC '03 Proceedings of the HLT-NAACL 03 on Text summarization workshop - Volume 5
Intelligent multimedia indexing and retrieval through multi-source information extraction and merging

IJCAI'03 Proceedings of the 18th international joint conference on Artificial intelligence
The automatic creation of literature abstracts

IBM Journal of Research and Development
Machine-made index for technical literature: an experiment

IBM Journal of Research and Development

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper describes an efficient algorithm for language independent generic extractive summarization for single document. The algorithm is based on structural and statistical (rather than semantic) factors. Through evaluations performed on a single-document summarization for English, Hindi, Gujarati and Urdu documents, we show that the method performs equally well regardless of the language. The algorithm has been applied on DUC data for English documents and various newspaper articles for other languages with corresponding stop words list and modified stemmer. The results of summarization have been compared with DUC 2002 data using degree of representativeness. For other languages, the degree of representativeness we get is highly encouraging.