Data compression and Gray-code sorting
Information Processing Letters
Rearranging data to maximize the efficiency of compression
PODS '86 Proceedings of the fifth ACM SIGACT-SIGMOD symposium on Principles of database systems
Multiattribute hashing using Gray codes
SIGMOD '86 Proceedings of the 1986 ACM SIGMOD international conference on Management of data
A new class of heuristic algorithms for weighted perfect matching
Journal of the ACM (JACM)
Spacefilling curves and the planar travelling salesman problem
Journal of the ACM (JACM)
Faster scaling algorithms for general graph matching problems
Journal of the ACM (JACM)
Quickly generating billion-record synthetic databases
SIGMOD '94 Proceedings of the 1994 ACM SIGMOD international conference on Management of data
Using multiset discrimination to solve language processing problems without hashing
Theoretical Computer Science
Divide and conquer strategies for parallel TSP heuristics
Computers and Operations Research
When Hamming meets Euclid: the approximability of geometric TSP and MST (extended abstract)
STOC '97 Proceedings of the twenty-ninth annual ACM symposium on Theory of computing
Locality-preserving hashing in multidimensional spaces
STOC '97 Proceedings of the twenty-ninth annual ACM symposium on Theory of computing
Approximate nearest neighbors: towards removing the curse of dimensionality
STOC '98 Proceedings of the thirtieth annual ACM symposium on Theory of computing
Efficient search for approximate nearest neighbor in high dimensional spaces
STOC '98 Proceedings of the thirtieth annual ACM symposium on Theory of computing
A lower bound on the complexity of approximate nearest-neighbor searching on the Hamming cube
STOC '99 Proceedings of the thirty-first annual ACM symposium on Theory of computing
Improving performance of sparse matrix-vector multiplication
SC '99 Proceedings of the 1999 ACM/IEEE conference on Supercomputing
Compression of inverted indexes For fast query evaluation
SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
Binary Interpolative Coding for Effective Index Compression
Information Retrieval
Block-Oriented Compression Techniques for Large Statistical Databases
IEEE Transactions on Knowledge and Data Engineering
Similarity Search in High Dimensions via Hashing
VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
Hilbert R-tree: An Improved R-tree using Fractals
VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
On the Computation of Multidimensional Aggregates
VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases
Optimal Page Ordering for Region Queries in Static Spatial Databases
DEXA '00 Proceedings of the 11th International Conference on Database and Expert Systems Applications
Reclustering of High Energy Physics Data
SSDBM '99 Proceedings of the 11th International Conference on Scientific and Statistical Database Management
Byte-aligned bitmap compression
DCC '95 Proceedings of the Conference on Data Compression
Chained Lin-Kernighan for Large Traveling Salesman Problems
INFORMS Journal on Computing
A strong lower bound for approximate nearest neighbor searching
Information Processing Letters
Compressing Bitmap Indices by Data Reorganization
ICDE '05 Proceedings of the 21st International Conference on Data Engineering
C-store: a column-oriented DBMS
VLDB '05 Proceedings of the 31st international conference on Very large data bases
8/7-approximation algorithm for (1,2)-TSP
SODA '06 Proceedings of the seventeenth annual ACM-SIAM symposium on Discrete algorithm
Optimizing bitmap indices with efficient compression
ACM Transactions on Database Systems (TODS)
Integrating compression and execution in column-oriented database systems
Proceedings of the 2006 ACM SIGMOD international conference on Management of data
Simple and realistic data generation
VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Mixed-Radix Gray Codes in Lee Metric
IEEE Transactions on Computers
VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
Compressing large boolean matrices using reordering techniques
VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
A comparison of five probabilistic view-size estimation techniques in OLAP
Proceedings of the ACM tenth international workshop on Data warehousing and OLAP
Compact Hilbert indices: Space-filling curves for domains with unequal side lengths
Information Processing Letters
Traveling salesman path problems
Mathematical Programming: Series A and B
Column-stores vs. row-stores: how different are they really?
Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Optimizing Frequency Queries for Data Mining Applications
ICDM '07 Proceedings of the 2007 Seventh IEEE International Conference on Data Mining
Read-optimized databases, in depth
Proceedings of the VLDB Endowment
Dictionary-based order-preserving string compression for main memory column stores
Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
Sorting improves word-aligned bitmap indexes
Data & Knowledge Engineering
Efficient index compression in DB2 LUW
Proceedings of the VLDB Endowment
The Star Schema Benchmark and Augmented Fact Table Indexing
Performance Evaluation and Benchmarking
Index compression using 64-bit words
Software—Practice & Experience
The traveling salesman: computational solutions for TSP applications
The traveling salesman: computational solutions for TSP applications
Scalable techniques for document identifier assignment in inverted indexes
Proceedings of the 19th international conference on World wide web
An optimal algorithm for the distinct elements problem
Proceedings of the twenty-ninth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Speeding up queries in column stores: a case for compression
DaWaK'10 Proceedings of the 12th international conference on Data warehousing and knowledge discovery
NET-FLi: on-the-fly compression, archiving and indexing of streaming network traffic
Proceedings of the VLDB Endowment
Reordering columns for smaller indexes
Information Sciences: an International Journal
The Art of Computer Programming: Combinatorial Algorithms, Part 1
The Art of Computer Programming: Combinatorial Algorithms, Part 1
A Randomized Rounding Approach to the Traveling Salesman Problem
FOCS '11 Proceedings of the 2011 IEEE 52nd Annual Symposium on Foundations of Computer Science
Run-length encodings (Corresp.)
IEEE Transactions on Information Theory
Compression of individual sequences via variable-rate coding
IEEE Transactions on Information Theory
Match twice and stitch: a new TSP tour construction heuristic
Operations Research Letters
Real-time creation of bitmap indexes on streaming network data
The VLDB Journal — The International Journal on Very Large Data Bases
Minimizing index size by reordering rows and columns
SSDBM'12 Proceedings of the 24th international conference on Scientific and Statistical Database Management
A hilbert space compression architecture for data warehouse environments
DaWaK'07 Proceedings of the 9th international conference on Data Warehousing and Knowledge Discovery
Hi-index | 0.00 |
Sorting database tables before compressing them improves the compression rate. Can we do better than the lexicographical order? For minimizing the number of runs in a run-length encoding compression scheme, the best approaches to row-ordering are derived from traveling salesman heuristics, although there is a significant trade-off between running time and compression. A new heuristic, Multiple Lists, which is a variant on Nearest Neighbor that trades off compression for a major running-time speedup, is a good option for very large tables. However, for some compression schemes, it is more important to generate long runs rather than few runs. For this case, another novel heuristic, Vortex, is promising. We find that we can improve run-length encoding up to a factor of 3 whereas we can improve prefix coding by up to 80%: these gains are on top of the gains due to lexicographically sorting the table. We prove that the new row reordering is optimal (within 10%) at minimizing the runs of identical values within columns, in a few cases.