Data compression procedures utilizing the similarity of data

Authors:
Yahiko Kambayashi;Narao Nakatsu;Shuzo Yajima
Affiliations:
Kyoto University, Kyoto, Japan;Aichi University of Education, Aichi, Japan;Kyoto University, Kyoto, Japan
Venue:
AFIPS '81 Proceedings of the May 4-7, 1981, national computer conference
Year:
1981

Citing 6
Cited 0

Differential files: their application to the maintenance of large databases

ACM Transactions on Database Systems (TODS)
Economical encoding of commas between strings

Communications of the ACM
File organization: the consecutive retrieval property

Communications of the ACM
Common phrases and minimum-space text storage

Communications of the ACM
Storage Reduction Through Minimal Spanning Trees and Spanning Forests

IEEE Transactions on Computers
Linear pattern matching algorithms

SWAT '73 Proceedings of the 14th Annual Symposium on Switching and Automata Theory (swat 1973)

Quantified Score

Hi-index	0.00

Visualization

Abstract

In large database systems, we usually encounter the situation when a set of similar data is to be stored. This paper discusses efficient data compression procedures utilizing similarity of data. These procedures are suitable for compressing versions of programs, a series of data produced in an office etc. The procedure to compress one string utilizing regularity of data is as follows: 1. Calculate all maximum repeated substrings in the given string. 2. Since each repeated substring is required to be stored only once, replace the second and later occurrence of the same substring by the code which shows the position of the first occurrence of the substring. The procedure to compress two strings w1 and w2 utilizing data similarity is as follows: 1. Calculate all maximum common substrings of w1 and w2. 2. Find a minimum cover for w2 using the maximum common substrings contained in w1. 3. Encode w2 by codes, each of which shows a substring of w1. These procedures are shown to require time only proportional to the total length of data and thus they are efficient. Combinations and variations of these two procedures are also discussed in the paper.