Improved query performance with variant indexes
SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
The grid
The End-to-End Performance Effects of Parallel TCP Sockets on a Lossy Wide-Area Network
IPDPS '02 Proceedings of the 16th International Parallel and Distributed Processing Symposium
Compressing Bitmap Indexes for Faster Search Operations
SSDBM '02 Proceedings of the 14th International Conference on Scientific and Statistical Database Management
On Individual and Aggregate TCP Performance
ICNP '99 Proceedings of the Seventh Annual International Conference on Network Protocols
Byte-aligned bitmap compression
DCC '95 Proceedings of the Conference on Data Compression
Stork: Making Data Placement a First Class Citizen in the Grid
ICDCS '04 Proceedings of the 24th International Conference on Distributed Computing Systems (ICDCS'04)
An Approach for Automatic Data Virtualization
HPDC '04 Proceedings of the 13th IEEE International Symposium on High Performance Distributed Computing
Modeling and Taming Parallel TCP on the Wide Area Network
IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Papers - Volume 01
Practical performance portability in the Parallel Ocean Program (POP): Research Articles
Concurrency and Computation: Practice & Experience - The High Performance Architectural Challenge: Mass Market versus Proprietary Components?
Using bitmap index for interactive exploration of large datasets
SSDBM '03 Proceedings of the 15th International Conference on Scientific and Statistical Database Management
Scientific workflow management and the Kepler system: Research Articles
Concurrency and Computation: Practice & Experience - Workflow in Grid Systems
Clustered Workflow Execution of Retargeted Data Analysis Scripts
CCGRID '08 Proceedings of the 2008 Eighth IEEE International Symposium on Cluster Computing and the Grid
Breaking the Curse of Cardinality on Bitmap Indexes
SSDBM '08 Proceedings of the 20th international conference on Scientific and Statistical Database Management
Improving GridFTP performance using the Phoebus session layer
Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
Lessons learned from moving earth system grid data sets over a 20 Gbps wide-area network
Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing
A data transfer framework for large-scale science experiments
Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing
Parallel index and query for large scale data analysis
Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis
SciHadoop: array-based query processing in Hadoop
Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis
A parallel data storage interface to GridFTP
ODBASE'06/OTM'06 Proceedings of the 2006 Confederated international conference on On the Move to Meaningful Internet Systems: CoopIS, DOA, GADA, and ODBASE - Volume Part II
NoDB: efficient query execution on raw data files
SIGMOD '12 Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data
Supporting User-Defined Subsetting and Aggregation over Parallel NetCDF Datasets
CCGRID '12 Proceedings of the 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012)
Indexing and Parallel Query Processing Support for Visualizing Climate Datasets
ICPP '12 Proceedings of the 2012 41st International Conference on Parallel Processing
E-SCIENCE '12 Proceedings of the 2012 IEEE 8th International Conference on E-Science (e-Science)
Hi-index | 0.00 |
In many science areas where datasets need to be transferred or shared, rapid growth in dataset size, coupled with much slower increases in wide area data transfer bandwidths, is making it extremely hard for scientists to analyze the data. This paper addresses the current limitations by developing SDQuery DSI, a GridFTP plug-in that supports flexible server-side data subsetting. An existing GridFTP server is able to dynamically load this tool to support new functionality. Different queries types (query over dimensions, coordinates and values) are supported by our tool. A number of optimizations, like parallel indexing, performance model for data subsetting, and parallel streaming are also applied. We compare our SDQuery DSI with GridFTP default File DSI in different network environments, and show that our method can achieve better efficiency in almost all cases.