SOHAC: efficient storage of tick data that supports search and analysis

  • Authors:
  • Gabor I. Nagy;Krisztian Buza

  • Affiliations:
  • Budapest University of Technology and Economics, Budapest, Hungary;Budapest University of Technology and Economics, Budapest, Hungary

  • Venue:
  • ICDM'12 Proceedings of the 12th Industrial conference on Advances in Data Mining: applications and theoretical aspects
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

Storage of tick data is a challenging problem because two criteria have to be fulfilled simultaneously: the storage structure should allow fast execution of queries and the data should not occupy too much space on the hard disk or in the main memory. In this paper, we present a clustering-based solution, and we introduce a new clustering algorithm that is designed to support the storage of tick data. We evaluate our algorithm both on publicly available real-world datasets, as well as real-world tick data from the financial domain provided by one of the world-wide most renowned investment bank. In our experiments we compare our approach, SOHAC, against a large collection of conventional hierarchical clustering algorithms from the literature. The experiments show that our algorithm substantially outperforms --- both in terms of statistical significance and practical relevance --- the examined clustering algorithms for the tick data storage problem.