Data compression procedures utilizing the similarity of data

  • Authors:
  • Yahiko Kambayashi;Narao Nakatsu;Shuzo Yajima

  • Affiliations:
  • Kyoto University, Kyoto, Japan;Aichi University of Education, Aichi, Japan;Kyoto University, Kyoto, Japan

  • Venue:
  • AFIPS '81 Proceedings of the May 4-7, 1981, national computer conference
  • Year:
  • 1981

Quantified Score

Hi-index 0.00

Visualization

Abstract

In large database systems, we usually encounter the situation when a set of similar data is to be stored. This paper discusses efficient data compression procedures utilizing similarity of data. These procedures are suitable for compressing versions of programs, a series of data produced in an office etc. The procedure to compress one string utilizing regularity of data is as follows: 1. Calculate all maximum repeated substrings in the given string. 2. Since each repeated substring is required to be stored only once, replace the second and later occurrence of the same substring by the code which shows the position of the first occurrence of the substring. The procedure to compress two strings w1 and w2 utilizing data similarity is as follows: 1. Calculate all maximum common substrings of w1 and w2. 2. Find a minimum cover for w2 using the maximum common substrings contained in w1. 3. Encode w2 by codes, each of which shows a substring of w1. These procedures are shown to require time only proportional to the total length of data and thus they are efficient. Combinations and variations of these two procedures are also discussed in the paper.