Parameterised compression for sparse bitmaps
SIGIR '92 Proceedings of the 15th annual international ACM SIGIR conference on Research and development in information retrieval
Management of large amounts of data in interactive building walkthroughs
I3D '92 Proceedings of the 1992 symposium on Interactive 3D graphics
Partitioning and ordering large radiosity computations
SIGGRAPH '94 Proceedings of the 21st annual conference on Computer graphics and interactive techniques
BIRCH: an efficient data clustering method for very large databases
SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
Efficient search for approximate nearest neighbor in high dimensional spaces
STOC '98 Proceedings of the thirtieth annual ACM symposium on Theory of computing
MMR: an interactive massive model rendering system using geometric and image-based acceleration
I3D '99 Proceedings of the 1999 symposium on Interactive 3D graphics
Physical mapping of chromosomes: a combinatorial problem in molecular biology
SODA '93 Proceedings of the fourth annual ACM-SIAM Symposium on Discrete algorithms
Engineering the compression of massive tables: an experimental approach
SODA '00 Proceedings of the eleventh annual ACM-SIAM symposium on Discrete algorithms
Finding Interesting Associations without Support Pruning
IEEE Transactions on Knowledge and Data Engineering
Performance Measurements of Compressed Bitmap Indices
VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
Walking Through a Very Large Virtual Environment in Real-time
Proceedings of the 27th International Conference on Very Large Data Bases
Clustering Data Streams: Theory and Practice
IEEE Transactions on Knowledge and Data Engineering
On the Impossibility of Dimension Reduction in \ell _1
FOCS '03 Proceedings of the 44th Annual IEEE Symposium on Foundations of Computer Science
Improving table compression with combinatorial optimization
Journal of the ACM (JACM)
Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Compressing Bitmap Indices by Data Reorganization
ICDE '05 Proceedings of the 21st International Conference on Data Engineering
Graphical Models - Special issue on PG2004
Approximate encoding for direct access and query processing over compressed bitmaps
VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Rearrangement Clustering: Pitfalls, Remedies, and Applications
The Journal of Machine Learning Research
The Concentration of Fractional Distances
IEEE Transactions on Knowledge and Data Engineering
GraphScope: parameter-free mining of large time-evolving graphs
Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
RLH: bitmap compression technique based on run-length and huffman encoding
Proceedings of the ACM tenth international workshop on Data warehousing and OLAP
Succinct summarization of transactional databases: an overlapped hyperrectangle scheme
Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Analysis of Basic Data Reordering Techniques
SSDBM '08 Proceedings of the 20th international conference on Scientific and Statistical Database Management
Dynamic data organization for bitmap indices
Proceedings of the 3rd international conference on Scalable information systems
Secondary bitmap indexes with vertical and horizontal partitioning
Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology
RLH: Bitmap compression technique based on run-length and Huffman encoding
Information Systems
A Bipartite Graph Framework for Summarizing High-Dimensional Binary, Categorical and Numeric Data
SSDBM 2009 Proceedings of the 21st International Conference on Scientific and Statistical Database Management
Scalable techniques for document identifier assignment in inverted indexes
Proceedings of the 19th international conference on World wide web
Continuous summarization of co-evolving data in large water distribution network
WAIM'10 Proceedings of the 11th international conference on Web-age information management
Reordering columns for smaller indexes
Information Sciences: an International Journal
Path-based supports for hypergraphs
IWOCA'10 Proceedings of the 21st international conference on Combinatorial algorithms
Summarizing transactional databases with overlapped hyperrectangles
Data Mining and Knowledge Discovery
ISABELA-QA: query-driven analytics with ISABELA-compressed extreme-scale scientific data
Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis
RIVA: indexing and visualization of high-dimensional data via dimension reorderings
PKDD'06 Proceedings of the 10th European conference on Principle and Practice of Knowledge Discovery in Databases
A compression-boosting transform for two-dimensional data
AAIM'06 Proceedings of the Second international conference on Algorithmic Aspects in Information and Management
Path-based supports for hypergraphs
Journal of Discrete Algorithms
Reordering rows for better compression: Beyond the lexicographic order
ACM Transactions on Database Systems (TODS)
Processing a trillion cells per mouse click
Proceedings of the VLDB Endowment
Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval
Bitlist: new full-text index for low space cost and efficient keyword search
Proceedings of the VLDB Endowment
Hi-index | 0.00 |
Large boolean matrices are a basic representational unit in a variety of applications, with some notable examples being interactive visualization systems, mining large graph structures, and association rule mining. Designing space and time efficient scalable storage and query mechanisms for such large matrices is a challenging problem. We present a lossless compression strategy to store and access such large matrices efficiently on disk. Our approach is based on viewing the columns of the matrix as points in a very high dimensional Hamming space, and then formulating an appropriate optimization problem that reduces to solving an instance of the Traveling Salesman Problem on this space. Finding good solutions to large TSP's in high dimensional Hamming spaces is itself a challenging and little-explored problem -- we cannot readily exploit geometry to avoid the need to examine all N2 inter-city distances and instances can be too large for standard TSP codes to run in main memory. Our multi-faceted approach adapts classical TSP heuristics by means of instance-partitioning and sampling, and may be of independent interest. For instances derived from interactive visualization and telephone call data we obtain significant improvement in access time over standard techniques, and for the visualization application we also make significant improvements in compression.