New methods for compression of MP double array by compact management of suffixes

  • Authors:
  • Tshering C. Dorji;El-sayed Atlam;Susumu Yata;Mahmoud Rokaya;Masao Fuketa;Kazuhiro Morita;Jun-ichi Aoe

  • Affiliations:
  • Department of Information Science and Intelligent Systems, Faculty of Engineering, University of Tokushima, Minamijosanjima 2-1, Tokushima, 770-8506, Japan;Department of Information Science and Intelligent Systems, Faculty of Engineering, University of Tokushima, Minamijosanjima 2-1, Tokushima, 770-8506, Japan;Department of Information Science and Intelligent Systems, Faculty of Engineering, University of Tokushima, Minamijosanjima 2-1, Tokushima, 770-8506, Japan;Department of Information Science and Intelligent Systems, Faculty of Engineering, University of Tokushima, Minamijosanjima 2-1, Tokushima, 770-8506, Japan;Department of Information Science and Intelligent Systems, Faculty of Engineering, University of Tokushima, Minamijosanjima 2-1, Tokushima, 770-8506, Japan;Department of Information Science and Intelligent Systems, Faculty of Engineering, University of Tokushima, Minamijosanjima 2-1, Tokushima, 770-8506, Japan;Department of Information Science and Intelligent Systems, Faculty of Engineering, University of Tokushima, Minamijosanjima 2-1, Tokushima, 770-8506, Japan

  • Venue:
  • Information Processing and Management: an International Journal
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

Minimal Prefix (MP) double array is an efficient data structure for a trie. However, its space efficiency is degraded by the non-compact management of suffixes. This paper presents three methods to compress the MP double array. The first two methods compress the MP double array by accommodating short suffixes inside the leaf nodes, and pruning leaf nodes corresponding to the end marker symbol. These methods achieve size reduction of up to 20%, making insertion and deletion faster at the same time while maintaining the retrieval time of O(1). The third method eliminates empty spaces in the array that holds suffixes, and improves the maximum size reduction further by about 5% at the cost of increased insertion time. Compared to a Ternary Search Tree, the key retrieval of the compressed MP double array is 50% faster and its size is 3-5 times smaller.