Effect of Preprocessing on Extractive Summarization with Maximal Frequent Sequences

Authors:
Yulia Ledeneva
Affiliations:
Center for Computing Research, National Polytechnic Institute, D.F., Mexico 07738
Venue:
MICAI '08 Proceedings of the 7th Mexican International Conference on Artificial Intelligence: Advances in Artificial Intelligence
Year:
2008

Citing 12
Cited 0

Information retrieval: data structures and algorithms

Information retrieval: data structures and algorithms
Readings in information retrieval

Readings in information retrieval
Stemming methodologies over individual query words for an Arabic information retrieval system

Journal of the American Society for Information Science
Modern Information Retrieval

Modern Information Retrieval
Improving stemming for Arabic information retrieval: light stemming and co-occurrence analysis

SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
On retrieval performance of Malay textual documents

AIA'06 Proceedings of the 24th IASTED international conference on Artificial intelligence and applications
Terms derived from frequent sequences for extractive text summarization

CICLing'08 Proceedings of the 9th international conference on Computational linguistics and intelligent text processing
Random walks on text structures

CICLing'06 Proceedings of the 7th international conference on Computational Linguistics and Intelligent Text Processing
Multi-document summarization based on BE-Vector clustering

CICLing'06 Proceedings of the 7th international conference on Computational Linguistics and Intelligent Text Processing
A new algorithm for fast discovery of maximal sequential patterns in a document collection

CICLing'06 Proceedings of the 7th international conference on Computational Linguistics and Intelligent Text Processing
Using word sequences for text summarization

TSD'06 Proceedings of the 9th international conference on Text, Speech and Dialogue
Benefits of resource-based stemming in hungarian information retrieval

CLEF'06 Proceedings of the 7th international conference on Cross-Language Evaluation Forum: evaluation of multilingual and multi-modal information retrieval

Quantified Score

Hi-index	0.00

Visualization

Abstract

The task of extractive summarization consists in producing a text summary by extracting a subset of text segments, such as sentences, and concatenating them to form a summary of the original text. The selection of sentences is based on terms they contain, which can be single words or multiword expressions. In a previous work, we have suggested so-called Maximal Frequent Sequences as such terms. In this paper, we investigate the effect of preprocessing on the process of selecting such sequences. Our results suggest that the accuracy of the method is, contrary to expectations, not seriously affected by preprocessing--which is both bad and good news, as we show.