Universal Data Compression Based on the Burrows-Wheeler Transformation: Theory and Practice
IEEE Transactions on Computers
Searching Digital Music Libraries
ICADL '02 Proceedings of the 5th International Conference on Asian Digital Libraries: Digital Libraries: People, Knowledge, and Technology
COM '00 Proceedings of the 11th Annual Symposium on Combinatorial Pattern Matching
String Matching with Stopper Encoding and Code Splitting
CPM '02 Proceedings of the 13th Annual Symposium on Combinatorial Pattern Matching
A Dynamic Data Structure for Reverse Lexicographically Sorted Prefixes
CPM '99 Proceedings of the 10th Annual Symposium on Combinatorial Pattern Matching
FPGA-Based Modelling Unit for High Speed Lossless Arithmetic Coding
FPL '01 Proceedings of the 11th International Conference on Field-Programmable Logic and Applications
On the Performance of BWT Sorting Algorithms
DCC '00 Proceedings of the Conference on Data Compression
Space-Time Tradeoffs in the Inverse B-W Transform
DCC '01 Proceedings of the Data Compression Conference
Design and Implementation of a Lossless Parallel High-Speed Data Compression System
IEEE Transactions on Parallel and Distributed Systems
Searching digital music libraries
Information Processing and Management: an International Journal - Special issue: An Asian digital libraries perspective
Alternative source coding model for mobile text communication
Proceedings of the 2005 ACM symposium on Applied computing
IEEE Transactions on Computers
An analysis of XML compression efficiency
Proceedings of the 2007 workshop on Experimental computer science
An analysis of XML binary formats and compression
ecs'07 Experimental computer science on Experimental computer science
Efficient Algorithms for the Inverse Sort Transform
IEEE Transactions on Computers
Evolutionary lossless compression with GP-ZIP*
Proceedings of the 10th annual conference on Genetic and evolutionary computation
Compression of small text files
Advanced Engineering Informatics
TinyLex: static n-gram index pruning with perfect recall
Proceedings of the 17th ACM conference on Information and knowledge management
Stateful hardware decompression in networking environment
Proceedings of the 4th ACM/IEEE Symposium on Architectures for Networking and Communications Systems
Hash Functions Based on Large Quasigroups
ICCS '09 Proceedings of the 9th International Conference on Computational Science: Part I
An Application of Self-organizing Data Structures to Compression
SEA '09 Proceedings of the 8th International Symposium on Experimental Algorithms
On prediction using variable order Markov models
Journal of Artificial Intelligence Research
Dynamic Edit Distance Table under a General Weighted Cost Function
SOFSEM '10 Proceedings of the 36th Conference on Current Trends in Theory and Practice of Computer Science
PPM with the extended alphabet
Information Sciences: an International Journal
Post BWT stages of the Burrows–Wheeler compression algorithm
Software—Practice & Experience
A compact representation of nondeterministic (suffix) automata for the bit-parallel approach
CPM'10 Proceedings of the 21st annual conference on Combinatorial pattern matching
Evolution of human-competitive lossless compression algorithms with GP-zip2
Genetic Programming and Evolvable Machines
Mapping words into codewords on PPM
SPIRE'06 Proceedings of the 13th international conference on String Processing and Information Retrieval
Suffix tree based data compression
SOFSEM'05 Proceedings of the 31st international conference on Theory and Practice of Computer Science
Searching for smallest grammars on large sequences and application to DNA
Journal of Discrete Algorithms
Suppressing redundancy in wireless sensor network traffic
DCOSS'10 Proceedings of the 6th IEEE international conference on Distributed Computing in Sensor Systems
Choosing word occurrences for the smallest grammar problem
LATA'10 Proceedings of the 4th international conference on Language and Automata Theory and Applications
A fast and efficient nearly-optimal adaptive Fano coding scheme
Information Sciences: an International Journal
Improving evolved alphabet using tabu set
HAIS'12 Proceedings of the 7th international conference on Hybrid Artificial Intelligent Systems - Volume Part I
Efficient computation of substring equivalence classes with suffix arrays
CPM'07 Proceedings of the 18th annual conference on Combinatorial Pattern Matching
An effective heuristic for the smallest grammar problem
Proceedings of the 15th annual conference on Genetic and evolutionary computation
Adaptive Online Compression in Clouds--Making Informed Decisions in Virtual Machine Environments
Journal of Grid Computing
Hi-index | 0.01 |
A number of authors have used the Calgary corpus of texts to provide empirical results for lossless compression algorithms. This corpus was collected in 1987, although it was not published until 1990. The advances with compression algorithms have been achieving relatively small improvements in compression, measured using the Calgary corpus. There is a concern that algorithms are being fine-tuned to this corpus, and that small improvements measured in this way may not apply to other files. Furthermore, the corpus is almost ten years old, and over this period there have been changes in the kinds of files that are compressed, particularly with the development of the Internet, and the rapid growth of high-capacity secondary storage for personal computers. We explore the issues raised above, and develop a principled technique for collecting a corpus of test data for compression methods. A corpus, called the Canterbury corpus, is developed using this technique, and we report the performance of a collection of compression methods using the new corpus.