Text Summarization by Sentence Extraction Using Unsupervised Learning

Authors:
René Arnulfo García-Hernández;Romyna Montiel;Yulia Ledeneva;Eréndira Rendón;Alexander Gelbukh;Rafael Cruz
Affiliations:
Pattern Recognition Laboratory, Toluca Institute of Technology, Mexico, Autonomous University of the State of Mexico, Mexico, Center for Computing Research, National Polytechnic Institute, Mexico, ...;Pattern Recognition Laboratory, Toluca Institute of Technology, Mexico, Autonomous University of the State of Mexico, Mexico, Center for Computing Research, National Polytechnic Institute, Mexico, ...;Pattern Recognition Laboratory, Toluca Institute of Technology, Mexico, Autonomous University of the State of Mexico, Mexico, Center for Computing Research, National Polytechnic Institute, Mexico, ...;Pattern Recognition Laboratory, Toluca Institute of Technology, Mexico, Autonomous University of the State of Mexico, Mexico, Center for Computing Research, National Polytechnic Institute, Mexico, ...;Pattern Recognition Laboratory, Toluca Institute of Technology, Mexico, Autonomous University of the State of Mexico, Mexico, Center for Computing Research, National Polytechnic Institute, Mexico, ...;Pattern Recognition Laboratory, Toluca Institute of Technology, Mexico, Autonomous University of the State of Mexico, Mexico, Center for Computing Research, National Polytechnic Institute, Mexico, ...
Venue:
MICAI '08 Proceedings of the 7th Mexican International Conference on Artificial Intelligence: Advances in Artificial Intelligence
Year:
2008

Citing 13
Cited 1

Term-weighting approaches in automatic text retrieval

Information Processing and Management: an International Journal
A trainable document summarizer

SIGIR '95 Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval
Automatic evaluation of summaries using N-gram co-occurrence statistics

NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1
Automated text summarization and the SUMMARIST system

TIPSTER '98 Proceedings of a workshop on held at Baltimore, Maryland: October 13-15, 1998
Random-Walk Term Weighting for Improved Text Classification

ICSC '07 Proceedings of the International Conference on Semantic Computing
A statistical approach to mechanized encoding and searching of literary information

IBM Journal of Research and Development
Terms derived from frequent sequences for extractive text summarization

CICLing'08 Proceedings of the 9th international conference on Computational linguistics and intelligent text processing
Random walks on text structures

CICLing'06 Proceedings of the 7th international conference on Computational Linguistics and Intelligent Text Processing
Multi-document summarization based on BE-Vector clustering

CICLing'06 Proceedings of the 7th international conference on Computational Linguistics and Intelligent Text Processing
Deriving event relevance from the ontology constructed with formal concept analysis

CICLing'06 Proceedings of the 7th international conference on Computational Linguistics and Intelligent Text Processing
Summarisation through discourse structure

CICLing'05 Proceedings of the 6th international conference on Computational Linguistics and Intelligent Text Processing
Automatic extraction and learning of keyphrases from scientific articles

CICLing'05 Proceedings of the 6th international conference on Computational Linguistics and Intelligent Text Processing
Using word sequences for text summarization

TSD'06 Proceedings of the 9th international conference on Text, Speech and Dialogue

EM clustering algorithm for automatic text summarization

MICAI'11 Proceedings of the 10th Mexican international conference on Advances in Artificial Intelligence - Volume Part I

Quantified Score

Hi-index	0.00

Visualization

Abstract

The main problem for generating an extractive automatic text summary is to detect the most relevant information in the source document. Although, some approaches claim being domain and language independent, they use high dependence knowledge like key-phrases or golden samples for machine-learning approaches. In this work, we propose a language- and domain-independent automatic text summarization approach by sentence extraction using an unsupervised learning algorithm. Our hypothesis is that an unsupervised algorithm can help for clustering similar ideas (sentences). Then, for composing the summary, the most representative sentence is selected from each cluster. Several experiments in the standard DUC-2002 collection show that the proposed method obtains more favorable results than other approaches.