ROCK: a robust clustering algorithm for categorical attributes
Information Systems
An Efficient k-Means Clustering Algorithm: Analysis and Implementation
IEEE Transactions on Pattern Analysis and Machine Intelligence
Data Compression
Introduction to Data Mining, (First Edition)
Introduction to Data Mining, (First Edition)
Spectral clustering in telephone call graphs
Proceedings of the 9th WebKDD and 1st SNA-KDD 2007 workshop on Web mining and social network analysis
Spectral Clustering in Social-Tagging Systems
WISE '09 Proceedings of the 10th International Conference on Web Information Systems Engineering
Data Mining: Practical Machine Learning Tools and Techniques
Data Mining: Practical Machine Learning Tools and Techniques
The role of hubness in clustering high-dimensional data
PAKDD'11 Proceedings of the 15th Pacific-Asia conference on Advances in knowledge discovery and data mining - Volume Part I
A sober look at clustering stability
COLT'06 Proceedings of the 19th annual conference on Learning Theory
Survey of clustering algorithms
IEEE Transactions on Neural Networks
Storage-optimizing clustering algorithms for high-dimensional tick data
Expert Systems with Applications: An International Journal
Hi-index | 0.00 |
Storage of tick data is a challenging problem because two criteria have to be fulfilled simultaneously: the storage structure should allow fast execution of queries and the data should not occupy too much space on the hard disk or in the main memory. In this paper, we present a clustering-based solution, and we introduce a new clustering algorithm that is designed to support the storage of tick data. We evaluate our algorithm both on publicly available real-world datasets, as well as real-world tick data from the financial domain provided by one of the world-wide most renowned investment bank. In our experiments we compare our approach, SOHAC, against a large collection of conventional hierarchical clustering algorithms from the literature. The experiments show that our algorithm substantially outperforms --- both in terms of statistical significance and practical relevance --- the examined clustering algorithms for the tick data storage problem.