A measure of transaction processing power
Datamation
The input/output complexity of sorting and related problems
Communications of the ACM
FastSort: a distributed single-input single-output external sort
SIGMOD '90 Proceedings of the 1990 ACM SIGMOD international conference on Management of data
Radix sort for vector multiprocessors
Proceedings of the 1991 ACM/IEEE conference on Supercomputing
A practical external sort for shared disk MPP's
Proceedings of the 1993 ACM/IEEE conference on Supercomputing
AlphaSort: a RISC machine sort
SIGMOD '94 Proceedings of the 1994 ACM SIGMOD international conference on Management of data
Greed sort: optimal deterministic sorting on parallel disks
Journal of the ACM (JACM)
A super scalar sort algorithm for RISC processors
SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
High-performance sorting on networks of workstations
SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
The influence of caches on the performance of sorting
SODA '97 Proceedings of the eighth annual ACM-SIAM symposium on Discrete algorithms
Sorting on Electronic Computer Systems
Journal of the ACM (JACM)
Parallel sorting on a shared-nothing architecture using probabilistic splitting
PDIS '91 Proceedings of the first international conference on Parallel and distributed information systems
External memory algorithms and data structures: dealing with massive data
ACM Computing Surveys (CSUR)
Introduction to algorithms
Conjunctive selection conditions in main memory
Proceedings of the twenty-first ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Implementing database operations using SIMD instructions
Proceedings of the 2002 ACM SIGMOD international conference on Management of data
AlphaSort: a cache-sensitive parallel external sort
The VLDB Journal — The International Journal on Very Large Data Bases
Data streams: algorithms and applications
SODA '03 Proceedings of the fourteenth annual ACM-SIAM symposium on Discrete algorithms
Cache Conscious Indexing for Decision-Support in Main Memory
VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
Database Architecture Optimized for the New Bottleneck: Memory Access
VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
DBMSs on a Modern Processor: Where Does Time Go?
VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
What Happens During a Join? Dissecting CPU and Memory Optimization Effects
VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
Weaving Relations for Cache Performance
Proceedings of the 27th International Conference on Very Large Data Bases
Sorting Large Data Files on POOMA
CONPAR 90/VAPP IV Proceedings of the Joint International Conference on Vector and Parallel Processing
Cache Conscious Algorithms for Relational Query Processing
VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Photon mapping on programmable graphics hardware
Proceedings of the ACM SIGGRAPH/EUROGRAPHICS conference on Graphics hardware
Hardware acceleration for spatial selections and joins
Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Fast computation of database operations using graphics processors
SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
UberFlow: a GPU-based particle engine
Proceedings of the ACM SIGGRAPH/EUROGRAPHICS conference on Graphics hardware
Fast and approximate stream mining of quantiles and frequencies using graphics processors
Proceedings of the 2005 ACM SIGMOD international conference on Management of data
Multithreaded architectures and the sort benchmark
DaMoN '05 Proceedings of the 1st international workshop on Data management on new hardware
Efficient relational database management using graphics processors
DaMoN '05 Proceedings of the 1st international workshop on Data management on new hardware
Generic database cost models for hierarchical memory systems
VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
A memory model for scientific algorithms on graphics processors
Proceedings of the 2006 ACM/IEEE conference on Supercomputing
JouleSort: a balanced energy-efficiency benchmark
Proceedings of the 2007 ACM SIGMOD international conference on Management of data
GPUQP: query co-processing using graphics processors
Proceedings of the 2007 ACM SIGMOD international conference on Management of data
Scan primitives for GPU computing
Proceedings of the 22nd ACM SIGGRAPH/EUROGRAPHICS symposium on Graphics hardware
Resolution-matched shadow maps
ACM Transactions on Graphics (TOG)
Optimising data movement rates for parallel processing applications on graphics processors
PDCN'07 Proceedings of the 25th conference on Proceedings of the 25th IASTED International Multi-Conference: parallel and distributed computing and networks
Cache-efficient numerical algorithms using graphics hardware
Parallel Computing
Executing stream joins on the cell processor
VLDB '07 Proceedings of the 33rd international conference on Very large data bases
CellSort: high performance sorting on the cell processor
VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Efficient gather and scatter operations on graphics processors
Proceedings of the 2007 ACM/IEEE conference on Supercomputing
Application development on hybrid systems
Proceedings of the 2007 ACM/IEEE conference on Supercomputing
Pipelined hash-join on multithreaded architectures
DaMoN '07 Proceedings of the 3rd international workshop on Data management on new hardware
In-memory grid files on graphics processors
DaMoN '07 Proceedings of the 3rd international workshop on Data management on new hardware
Queue - GPU Computing
Using graphics processors for high-performance IR query processing
Proceedings of the 17th international conference on World Wide Web
Visions for application development on hybrid computing systems
Parallel Computing
Relational joins on graphics processors
Proceedings of the 2008 ACM SIGMOD international conference on Management of data
ACM SIGGRAPH 2008 classes
Breaking the memory wall in MonetDB
Communications of the ACM - Surviving the data deluge
Fast parallel GPU-sorting using a hybrid algorithm
Journal of Parallel and Distributed Computing
AES Encryption Implementation and Analysis on Commodity Graphics Processing Units
CHES '07 Proceedings of the 9th international workshop on Cryptographic Hardware and Embedded Systems
A Practical Quicksort Algorithm for Graphics Processors
ESA '08 Proceedings of the 16th annual European symposium on Algorithms
Mars: a MapReduce framework on graphics processors
Proceedings of the 17th international conference on Parallel architectures and compilation techniques
Efficient implementation of sorting on multi-core SIMD CPU architecture
Proceedings of the VLDB Endowment
Optimizing the parallel computation of linear recurrences using compact matrix representations
Journal of Parallel and Distributed Computing
Using graphics processors for high performance IR query processing
Proceedings of the 18th international conference on World wide web
CellJoin: a parallel stream join operator for the cell processor
The VLDB Journal — The International Journal on Very Large Data Bases
Data parallel acceleration of decision support queries using Cell/BE and GPUs
Proceedings of the 6th ACM conference on Computing frontiers
A translation system for enabling data mining applications on GPUs
Proceedings of the 23rd international conference on Supercomputing
FPGA: what's in it for a database?
Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
Data Parallel Bin-Based Indexing for Answering Queries on Multi-core Architectures
SSDBM 2009 Proceedings of the 21st International Conference on Scientific and Statistical Database Management
GPU-Quicksort: A practical Quicksort algorithm for graphics processors
Journal of Experimental Algorithmics (JEA)
Frequent itemset mining on graphics processors
Proceedings of the Fifth International Workshop on Data Management on New Hardware
psort, Yet Another Fast Stable Sorting Software
SEA '09 Proceedings of the 8th International Symposium on Experimental Algorithms
Relational query coprocessing on graphics processors
ACM Transactions on Database Systems (TODS)
A molecular docking system using CUDA
Proceedings of the 2009 International Conference on Hybrid Information Technology
Density-based clustering using graphics processors
Proceedings of the 18th ACM conference on Information and knowledge management
Sorting on architecturally diverse computer systems
Proceedings of the Third International Workshop on High-Performance Reconfigurable Computing Technology and Applications
Using the graphics processor unit to realize data streaming operations
Proceedings of the 6th Middleware Doctoral Symposium
Sort vs. Hash revisited: fast join implementation on modern multi-core CPUs
Proceedings of the VLDB Endowment
Streams on wires: a query compiler for FPGAs
Proceedings of the VLDB Endowment
Proceedings of the VLDB Endowment
Accelerating SQL database operations on a GPU with CUDA
Proceedings of the 3rd Workshop on General-Purpose Computation on Graphics Processing Units
The Data Cyclotron query processing scheme
Proceedings of the 13th International Conference on Extending Database Technology
FPGAs: a new point in the database design space
Proceedings of the 13th International Conference on Extending Database Technology
A real-time GPU-based wall detection algorithm for mapping and navigation in indoor environments
Proceedings of the 2007 conference on Human interface: Part II
State-of-the-art in heterogeneous computing
Scientific Programming
Fast sort on CPUs and GPUs: a case for bandwidth oblivious SIMD sort
Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Debunking the 100X GPU vs. CPU myth: an evaluation of throughput computing on CPU and GPU
Proceedings of the 37th annual international symposium on Computer architecture
Parallel SimRank computation on large graphs with iterative aggregation
Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining
MapCG: writing parallel program portable between CPU and GPU
Proceedings of the 19th international conference on Parallel architectures and compilation techniques
Parallel search on video cards
HotPar'09 Proceedings of the First USENIX conference on Hot topics in parallelism
Energy-aware high performance computing with graphic processing units
HotPower'08 Proceedings of the 2008 conference on Power aware computing and systems
Performance evaluation and scaling of a multiprocessor architecture emulating complex SNN algorithms
ICES'10 Proceedings of the 9th international conference on Evolvable systems: from biology to hardware
GPU-WAH: applying GPUs to compressing bitmap indexes with word aligned hybrid
DEXA'10 Proceedings of the 21st international conference on Database and expert systems applications: Part II
Database compression on graphics processors
Proceedings of the VLDB Endowment
HLBVH: hierarchical LBVH construction for real-time ray tracing of dynamic geometry
Proceedings of the Conference on High Performance Graphics
Efficient explicit-state model checking on general purpose graphics processors
SPIN'10 Proceedings of the 17th international SPIN conference on Model checking software
Comparing GPU and CPU in OLAP cubes creation
SOFSEM'11 Proceedings of the 37th international conference on Current trends in theory and practice of computer science
Proceedings of the 19th ACM/SIGDA international symposium on Field programmable gate arrays
High-throughput transaction executions on graphics processors
Proceedings of the VLDB Endowment
Datalog and emerging applications: an interactive tutorial
Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
External memory breadth-first search with delayed duplicate detection on the GPU
MoChArt'10 Proceedings of the 6th international conference on Model checking and artificial intelligence
Highly scalable multi objective test suite minimisation using graphics cards
SSBSE'11 Proceedings of the Third international conference on Search based software engineering
The data cyclotron query processing scheme
ACM Transactions on Database Systems (TODS)
The VLDB Journal — The International Journal on Very Large Data Bases
Simplification of FEM-models on cell BE
MMCS'08 Proceedings of the 7th international conference on Mathematical Methods for Curves and Surfaces
On the complexity of min-max sorting networks
Information Sciences: an International Journal
Accelerating dock6's amber scoring with graphic processing unit
ICA3PP'10 Proceedings of the 10th international conference on Algorithms and Architectures for Parallel Processing - Volume Part I
Scalable parallel minimum spanning forest computation
Proceedings of the 17th ACM SIGPLAN symposium on Principles and Practice of Parallel Programming
Foundations and Trends in Databases
Design and implementation of an efficient integer count sort in CUDA GPUs
Concurrency and Computation: Practice & Experience
A high-performance sorting algorithm for multicore single-instruction multiple-data processors
Software—Practice & Experience
VAST-Tree: a vector-advanced and compressed structure for massive data tree traversal
Proceedings of the 15th International Conference on Extending Database Technology
An experiment with asymmetric algorithm: CPU vs. GPU
DASFAA'12 Proceedings of the 17th international conference on Database Systems for Advanced Applications - Volume Part II
Optimizing MapReduce for GPUs with effective shared memory usage
Proceedings of the 21st international symposium on High-Performance Parallel and Distributed Computing
Fine-grain parallelism using multi-core, Cell/BE, and GPU Systems
Parallel Computing
Sorting on GPUs for large scale datasets: A thorough comparison
Information Processing and Management: an International Journal
Leveraging computation sharing and parallel processing in location-dependent query processing
The Journal of Supercomputing
Accelerating pathology image data cross-comparison on CPU-GPU hybrid systems
Proceedings of the VLDB Endowment
Accelerating MapReduce on a coupled CPU-GPU architecture
SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
A fast implementation of the octagon abstract domain on graphics hardware
SAS'07 Proceedings of the 14th international conference on Static Analysis
GPU acceleration of probabilistic frequent itemset mining from uncertain databases
Proceedings of the 21st ACM international conference on Information and knowledge management
Automatic selection of processing units for coprocessing in databases
ADBIS'12 Proceedings of the 16th East European conference on Advances in Databases and Information Systems
Parallel approaches to machine learning-A comprehensive survey
Journal of Parallel and Distributed Computing
Parallel Shellsort Algorithm for Many-Core GPUs with CUDA
International Journal of Grid and High Performance Computing
Performance Modeling of Spatio-Temporal Algorithms Over GEDS Framework
International Journal of Grid and High Performance Computing
Kernel Weaver: Automatically Fusing Database Primitives for Efficient GPU Computation
MICRO-45 Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture
Automatic synthesis of out-of-core algorithms
Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data
Accelerate MapReduce on GPUs with multi-level reduction
Proceedings of the 5th Asia-Pacific Symposium on Internetware
Efficient co-processor utilization in database query processing
Information Systems
Register level sort algorithm on multi-core SIMD processors
IA^3 '13 Proceedings of the 3rd Workshop on Irregular Applications: Architectures and Algorithms
The Yin and Yang of processing data warehousing queries on GPU devices
Proceedings of the VLDB Endowment
Why it is time for a HyPE: a hybrid query processing engine for efficient GPU coprocessing in DBMS
Proceedings of the VLDB Endowment
Hardware-oblivious parallelism for in-memory column-stores
Proceedings of the VLDB Endowment
GEDS: GPU execution of spatio-temporal queries over spatio-temporal data streams
Journal of Embedded Computing
Hi-index | 0.00 |
We present a novel external sorting algorithm using graphics processors (GPUs) on large databases composed of billions of records and wide keys. Our algorithm uses the data parallelism within a GPU along with task parallelism by scheduling some of the memory-intensive and compute-intensive threads on the GPU. Our new sorting architecture provides multiple memory interfaces on the same PC -- a fast and dedicated memory interface on the GPU along with the main memory interface for CPU computations. As a result, we achieve higher memory bandwidth as compared to CPU-based algorithms running on commodity PCs. Our approach takes into account the limited communication bandwidth between the CPU and the GPU, and reduces the data communication between the two processors. Our algorithm also improves the performance of disk transfers and achieves close to peak I/O performance. We have tested the performance of our algorithm on the SortBenchmark and applied it to large databases composed of a few hundred Gigabytes of data. Our results on a 3 GHz Pentium IV PC with $300 NVIDIA 7800 GT GPU indicate a significant performance improvement over optimized CPU-based algorithms on high-end PCs with 3.6 GHz Dual Xeon processors. Our implementation is able to outperform the current high-end PennySort benchmark and results in a higher performance to price ratio. Overall, our results indicate that using a GPU as a co-processor can significantly improve the performance of sorting algorithms on large databases.