Taming massive distributed datasets: data sampling using bitmap indices
Proceedings of the 22nd international symposium on High-performance parallel and distributed computing
SDQuery DSI: integrating data management support with a wide area data transfer protocol
SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Hi-index | 0.00 |
With increasing emphasis on analysis of large-scale scientific data, and with growing dataset sizes, a number of new challenges are arising. Particularly, novel data management solutions are needed, which can work together with the existing tools. This paper examines indexing support for supporting high-level queries (primarily those for sub setting) on array-based scientific datasets. This work is motivated by the limitations arising in visualizing climate datasets (stored in Net CDF), using tools like Para View. We have developed a new indexing strategy, which can help support a variety of sub setting queries over these datasets, including those requiring sub setting over dimensions/coordinates and those involving variable values. Our approach is based on bitmaps, but involves use of two-level indices and careful partitioning, based on query profiles. We also show how our indexing support can be used for sub setting operations executed in parallel. We compare our solutions against a number of other solutions, and demonstrate that our method is more effective.