Analysis of the Size of Antidictionary in DCA
CPM '08 Proceedings of the 19th annual symposium on Combinatorial Pattern Matching
Pattern Matching in DCA Coded Text
CIAA '08 Proceedings of the 13th international conference on Implementation and Applications of Automata
On the Construction of an Antidictionary with Linear Complexity Using the Suffix Tree
IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences
Hi-index | 0.00 |
The compression of binary texts using antidictionaries is a novel technique based on the fact that some substrings (called "antifactors") never appear in the text. Let sb be an antifactor, where b is its last bit. Every time s appears in the text we know that the next bit is b and hence omit its representation. Since building the set of all antifactors is space consuming at compression time, it is customary to limit the maximum length of antifactors considered up to a constant k. Larger k yields better compression of the text but requires more space at compression time.In this paper we introduce the notion of almost antifactors, which are strings that rarely appear in the text. More formally, almost antifactors are strings that, if we consider them as antifactors and separately code their occurrences as exceptions, the compression ratio improves. We show that almost antifactors permit improving compression with a limited amount of main memory to compress. Our experiments show that they obtain the same compression ofthe classical algorithm using only 30% - 55% of its memory space.