Text sparsification via local maxima

Authors:
Pilu Crescenzi;Alberto Del Lungo;Roberto Grossi;Elena Lodi;Linda Pagli;Gianluca Rossi
Affiliations:
Dipartimento di Sistemi e Informatica, Università degli Studi di Firenze, Via C. Lombroso 6117, 50134 Firenze, Italy;Dipartimento di Matematica, Università degli Studi di Siena, Via del Capitano 15, 53100 Siena, Italy;Dipartimento di Informatica, Università degli Studi di Pisa, Corso Italia 40, 56125 Pisa, Italy;Dipartimento di Matematica, Università degli Studi di Siena, Via del Capitano 15, 53100 Siena, Italy;Dipartimento di Informatica, Università degli Studi di Pisa, Corso Italia 40, 56125 Pisa, Italy;Dipartimento di Mathematica, University "Tor Vergata" of Rome, Via Ricera Scientifica, 1 00133 Roma, Italy
Venue:
Theoretical Computer Science
Year:
2003

Citing 10
Cited 0

Deterministic coin tossing with applications to optimal parallel list ranking

Information and Control
Suffix arrays: a new method for on-line string searches

SIAM Journal on Computing
Symmetry breaking for suffix tree construction

STOC '94 Proceedings of the twenty-sixth annual ACM symposium on Theory of computing
The data compression book (2nd ed.)

The data compression book (2nd ed.)
q-gram based database searching using a suffix array (QUASAR)

RECOMB '99 Proceedings of the third annual international conference on Computational molecular biology
Communication complexity of document exchange

SODA '00 Proceedings of the eleventh annual ACM-SIAM symposium on Discrete algorithms
Pattern matching in dynamic texts

SODA '00 Proceedings of the eleventh annual ACM-SIAM symposium on Discrete algorithms
Computers and Intractability: A Guide to the Theory of NP-Completeness

Computers and Intractability: A Guide to the Theory of NP-Completeness
Sparse Suffix Trees

COCOON '96 Proceedings of the Second Annual International Conference on Computing and Combinatorics
Efficient approximate and dynamic matching of patterns using a labeling paradigm

FOCS '96 Proceedings of the 37th Annual Symposium on Foundations of Computer Science

Quantified Score

Hi-index	5.23

Visualization

Abstract

In this paper we investigate some properties and algorithms related to a text sparsification technique based on the identification of local maxima in the given string. As the number of local maxima depends on the order assigned to the alphabet symbols, we first consider the case in which the order can be chosen in an arbitrary way. We show that looking for an order that minimizes the number of local maxima in the given text string is an NP-hard problem. Then, we consider the case in which the order is fixed a priori. Even though the order is not necessarily optimal, we can exploit the property that the average number of local maxima induced by the order in an arbitrary text is approximately one third of the text length. In particular, we describe how to iterate the process of selecting the local maxima by one or more iterations, so as to obtain a sparsified text. We show how to use this technique to filter the access to unstructured texts, which appear to have no natural division in words. Finally, we experimentally show that our approach can be successfully used in order to create a space efficient index for searching sufficiently long patterns in a DNA sequence as quickly as a full index.