ROCK: a robust clustering algorithm for categorical attributes
Information Systems
An Efficient k-Means Clustering Algorithm: Analysis and Implementation
IEEE Transactions on Pattern Analysis and Machine Intelligence
Data Compression: The Complete Reference
Data Compression: The Complete Reference
Spectral Clustering in Social-Tagging Systems
WISE '09 Proceedings of the 10th International Conference on Web Information Systems Engineering
Data Mining: Practical Machine Learning Tools and Techniques
Data Mining: Practical Machine Learning Tools and Techniques
Survey of clustering algorithms
IEEE Transactions on Neural Networks
SOHAC: efficient storage of tick data that supports search and analysis
ICDM'12 Proceedings of the 12th Industrial conference on Advances in Data Mining: applications and theoretical aspects
Hi-index | 12.05 |
Tick data are used in several applications that need to keep track of values changing over time, like prices on the stock market or meteorological measurements. Due to the possibly very frequent changes, the size of tick data tends to increase rapidly. Therefore, it becomes of paramount importance to reduce the storage space of tick data while, at the same time, allowing queries to be executed efficiently. In this paper, we propose an approach to decompose the original tick data matrix by clustering their attributes using a new clustering algorithm called Storage-Optimizing Hierarchical Agglomerative Clustering (SOHAC). We additionally propose a method for speeding up SOHAC based on a new lower bounding technique that allows SOHAC to be applied to high-dimensional tick data. Our experimental evaluation shows that the proposed approach compares favorably to several baselines in terms of compression. Additionally, it can lead to significant speedup in terms of running time.