Query evaluation techniques for large databases
ACM Computing Surveys (CSUR)
Practical lessons in supporting large-scale computational science
ACM SIGMOD Record
ACM Computing Surveys (CSUR)
Database--Principles, Programming and Performance
Database--Principles, Programming and Performance
Implementation techniques for main memory database systems
SIGMOD '84 Proceedings of the 1984 ACM SIGMOD international conference on Management of data
A Single-User Performance Evaluation of the Teradata Database Machine
Proceedings of the 2nd International Workshop on High Performance Transaction Systems
Model 204 Architecture and Performance
Proceedings of the 2nd International Workshop on High Performance Transaction Systems
T-Tree or B-Tree: Main Memory Database Index Structure Revisited
ADC '00 Proceedings of the Australasian Database Conference
Optimizing bitmap indices with efficient compression
ACM Transactions on Database Systems (TODS)
HDF5-FastQuery: Accelerating Complex Queries on HDF Datasets using Fast Bitmap Indices
SSDBM '06 Proceedings of the 18th International Conference on Scientific and Statistical Database Management
Detecting distributed scans using high-performance query-driven visualization
Proceedings of the 2006 ACM/IEEE conference on Supercomputing
High performance multivariate visual data exploration for extremely large data
Proceedings of the 2008 ACM/IEEE conference on Supercomputing
A practical scalable distributed B-tree
Proceedings of the VLDB Endowment
Principles of Distributed Database Systems
Principles of Distributed Database Systems
Analyses of multi-level and multi-component compressed bitmap indexes
ACM Transactions on Database Systems (TODS)
Scientific Data Management: Challenges, Technology, and Deployment
Scientific Data Management: Challenges, Technology, and Deployment
Federal market information technology in the post flash crash era: roles for supercomputing
Proceedings of the fourth workshop on High performance computational finance
Supporting User-Defined Subsetting and Aggregation over Parallel NetCDF Datasets
CCGRID '12 Proceedings of the 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012)
Parallel I/O, analysis, and visualization of a trillion particle simulation
SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Taming massive distributed datasets: data sampling using bitmap indices
Proceedings of the 22nd international symposium on High-performance parallel and distributed computing
Scalable in situ scientific data encoding for analytical query processing
Proceedings of the 22nd international symposium on High-performance parallel and distributed computing
Optimizing fastquery performance on lustre file system
Proceedings of the 25th International Conference on Scientific and Statistical Database Management
SDQuery DSI: integrating data management support with a wide area data transfer protocol
SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Hi-index | 0.00 |
Modern scientific datasets present numerous data management and analysis challenges. State-of-the-art index and query technologies are critical for facilitating interactive exploration of large datasets, but numerous challenges remain in terms of designing a system for processing general scientific datasets. The system needs to be able to run on distributed multi-core platforms, efficiently utilize underlying I/O infrastructure, and scale to massive datasets. We present FastQuery, a novel software framework that address these challenges. FastQuery utilizes a state-of-the-art index and query technology (FastBit) and is designed to process massive datasets on modern supercomputing platforms. We apply FastQuery to processing of a massive 50TB dataset generated by a large scale accelerator modeling code. We demonstrate the scalability of the tool to 11,520 cores. Motivated by the scientific need to search for interesting particles in this dataset, we use our framework to reduce search time from hours to tens of seconds.