A compression algorithm using integrated record information for translation dictionaries

  • Authors:
  • Y. Kadoya;M. Fuketa;El-Sayed Atlam;K. Morita;T. Sumitomo;J. Aoe

  • Affiliations:
  • Department of Information Science and Intelligent Systems, University of Tokushima, 2-1 Minami Josanjima Cho, Tokushima-Shi 770-8506, Japan;Department of Information Science and Intelligent Systems, University of Tokushima, 2-1 Minami Josanjima Cho, Tokushima-Shi 770-8506, Japan;Department of Information Science and Intelligent Systems, University of Tokushima, 2-1 Minami Josanjima Cho, Tokushima-Shi 770-8506, Japan;Department of Information Science and Intelligent Systems, University of Tokushima, 2-1 Minami Josanjima Cho, Tokushima-Shi 770-8506, Japan;Department of Information Science and Intelligent Systems, University of Tokushima, 2-1 Minami Josanjima Cho, Tokushima-Shi 770-8506, Japan;Department of Information Science and Intelligent Systems, University of Tokushima, 2-1 Minami Josanjima Cho, Tokushima-Shi 770-8506, Japan

  • Venue:
  • Information Sciences—Informatics and Computer Science: An International Journal - Special issue: Informatics and computer science intelligent systems applications
  • Year:
  • 2004

Quantified Score

Hi-index 0.00

Visualization

Abstract

A Trie structure is a well-known method for retrieving natural language (NL) dictionaries for morphological analysis, machine translation and so on. With the development of a variety of NL processing systems, some types of dictionaries in a computer hard disk have a lot of common information. This paper presents a method of merging individual dictionaries into the generalized dictionary. It enables us to reduce the total dictionary size and to expand the usage of individual dictionaries to that of the other applications. For key retrieval of the merged dictionary, there are many long strings such as compound words and idioms which take much space for a huge set of keys when stored in the Trie, so a fast trie structure, called a double-array structure is introduced and its compression scheme is proposed by replacing long strings into corresponding leaf node numbers of the Trie. Although the size of the presented records grows, the total number of them is extremely decreased by merging common information. The presented method is evaluated by the observation experimental results for nine dictionaries show that new method is more efficient than previous ones.