Algorithms on strings, trees, and sequences: computer science and computational biology
Algorithms on strings, trees, and sequences: computer science and computational biology
Compror: on-line lossless data compression with a factor oracle
Information Processing Letters
Reducing space for index implementation
Theoretical Computer Science
Factor Oracle: A New Structure for Pattern Matching
SOFSEM '99 Proceedings of the 26th Conference on Current Trends in Theory and Practice of Informatics on Theory and Practice of Informatics
Efficient Experimental String Matching by Weak Factor Recognition
CPM '01 Proceedings of the 12th Annual Symposium on Combinatorial Pattern Matching
Using Factor Oracles for Machine Improvisation
Soft Computing - A Fusion of Foundations, Methodologies and Applications
Substring search and repeat search using factor oracles
Information Processing Letters
Statistical Properties of Factor Oracles
CPM '09 Proceedings of the 20th Annual Symposium on Combinatorial Pattern Matching
Statistical properties of factor oracles
Journal of Discrete Algorithms
Hi-index | 0.00 |
Several methods to compress suffix trees were defined, most of them with the aim of obtaining compact (that is, space economical) index structures. Besides this practical aspect, a compression method can reveal structural properties of the resulting data structure, allowing a better understanding of it and a better estimation of its performances. In this paper, we propose a simple method to compress suffix trees by merging couples of nodes. This idea was already used in the literature in a context different from ours. The originality of our approach is that the nodes we merge are not chosen with respect to their subtrees (which is difficult to test algorithmically), nor with respect to the words spelled along branches (which usually requires testing several branches before finding the good one) but with respect to their position in the tree (which is easy to compute). Another particularity of our method is it needs to read no edge label: it is exclusively based on the topology of the suffix tree. The compact structure resulting after compression is the factor/suffix oracle introduced by Allauzen, Crochemore and Raffinot whose accepted language includes the accepted language of the corresponding suffix tree. The interest of our paper is therefore threefold:1.A topology-based compression method is defined for (compact) suffix trees. 2.A new property of a factor/suffix oracle is established, that is, like a DAG, it results from the corresponding suffix tree after a linear number of appropriate node mergings; unlike a DAG, the merged nodes do not necessarily have isomorphical subtrees. 3.A new algorithm to transform a suffix tree into a factor/suffix oracle is given, which has linear running time and thus improves the quadratic complexity previously known for the same task.