An Adaptive Algorithm for Splitting Large Sets of Strings and Its Application to Efficient External Sorting

  • Authors:
  • Tatsuya Asai;Seishi Okamoto;Hiroki Arimura

  • Affiliations:
  • Fujitsu Laboratories Ltd, Kawasaki, Japan 211---8588;Fujitsu Laboratories Ltd, Kawasaki, Japan 211---8588;Hokkaido University, Sapporo, Japan 060---0814

  • Venue:
  • New Frontiers in Applied Data Mining
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this paper, we study the problem of sorting a large collection of strings in external memory. Based on adaptive construction of a summary data structure, called adaptive synopsis trie , we present a practical string sorting algorithm DistStrSort , which is suitable for sorting string collections of large size in external memory, and also suitable for more complex string processing problems in text and semi-structured databases such as counting, aggregation, and statistics. Case analyses of the algorithm and experiments on real datasets show the efficiency of our algorithm in realistic setting.