Coding schemes variation and its impact on string hashing

  • Authors:
  • Suleiman H. Mustafa

  • Affiliations:
  • Department of Computer Science, Yarmouk University, Irbid, Jordan

  • Venue:
  • Computer Standards & Interfaces
  • Year:
  • 2002

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper presents the results of investigating the impact of variations found in character coding schemes on the performance of string hashing. The investigation involved three types of Arabic strings (single words, personal names, and document titles) and four different Arabic coding schemes. The results were examined in three different respects: collision rates, arithmetic code redundancy, and the contribution of arithmetic redundancy to the collision rate. Two items are considered arithmetically redundant, if they have the same numerical coding value. Even though the mathematical properties of coding schemes showed some impact on the hashing results, coding scheme variation was basically reflected in the results of hashing on single dictionary words. Where a difference was noted in the rates of arithmetic redundancy, it was accompanied by different growth patterns of collision. The results seem to indicate that the arithmetic properties of the collating sequence of a given coding scheme are likely to have some impact on the performance of string hashing.