Linear-Time, Incremental Hierarchy Inference for Compression

Authors:
Craig G. Nevill-Manning;Ian H. Witten
Affiliations:
-;-
Venue:
DCC '97 Proceedings of the Conference on Data Compression
Year:
1997

Citing 0
Cited 28

Whole program paths

Proceedings of the ACM SIGPLAN 1999 conference on Programming language design and implementation
Timestamped whole program path representation and its applications

Proceedings of the ACM SIGPLAN 2001 conference on Programming language design and implementation
Efficient representations and abstractions for quantifying and exploiting data reference locality

Proceedings of the ACM SIGPLAN 2001 conference on Programming language design and implementation
Dynamic hot data stream prefetching for general-purpose programs

PLDI '02 Proceedings of the ACM SIGPLAN 2002 Conference on Programming language design and implementation
Whole program Path-Based dynamic impact analysis

Proceedings of the 25th International Conference on Software Engineering
Compressed Pattern Matching for Sequitur

DCC '01 Proceedings of the Data Compression Conference
Cost effective dynamic program slicing

Proceedings of the ACM SIGPLAN 2004 conference on Programming language design and implementation
An Empirical Comparison of Dynamic Impact Analysis Algorithms

Proceedings of the 26th International Conference on Software Engineering
Efficient Forward Computation of Dynamic Slices Using Reduced Ordered Binary Decision Diagrams

Proceedings of the 26th International Conference on Software Engineering
Using Compressed Bytecode Traces for Slicing Java Programs

Proceedings of the 26th International Conference on Software Engineering
VPC3: a fast and effective trace-compression algorithm

Proceedings of the joint international conference on Measurement and modeling of computer systems
Design space exploration of caches using compressed traces

Proceedings of the 18th annual international conference on Supercomputing
Whole Execution Traces

Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
Automatic Generation of High-Performance Trace Compressors

Proceedings of the international symposium on Code generation and optimization
Supporting efficient query processing on compressed XML files

Proceedings of the 2005 ACM symposium on Applied computing
Arithmetic program paths

Proceedings of the 10th European software engineering conference held jointly with 13th ACM SIGSOFT international symposium on Foundations of software engineering
Whole execution traces and their applications

ACM Transactions on Architecture and Code Optimization (TACO)
Extended Whole Program Paths

Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques
The VPC Trace-Compression Algorithms

IEEE Transactions on Computers
An efficient single-pass trace compression technique utilizing instruction streams

ACM Transactions on Modeling and Computer Simulation (TOMACS)
METRIC: Memory tracing via dynamic binary rewriting to identify cache inefficiencies

ACM Transactions on Programming Languages and Systems (TOPLAS)
Unified control flow and data dependence traces

ACM Transactions on Architecture and Code Optimization (TACO)
Dynamic slicing on Java bytecode traces

ACM Transactions on Programming Languages and Systems (TOPLAS)
Profiling Java programs for parallelism

IWMSE '09 Proceedings of the 2009 ICSE Workshop on Multicore Software Engineering
A holistic approach to managing software change impact

Journal of Systems and Software
Scalable Communication Trace Compression

CCGRID '10 Proceedings of the 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing
An extended assessment of type-3 clones as detected by state-of-the-art tools

Software Quality Control
Elastic and scalable tracing and accurate replay of non-deterministic events

Proceedings of the 27th international ACM conference on International conference on supercomputing

Quantified Score

Hi-index	0.01

Visualization

Abstract

Data compression and learning are, in some sense, two sides of the same coin. If we paraphrase Occam's razor by saying that a small theory is better than a larger theory with the same explanatory power, we can characterize data compression as a preoccupation with small, and learning as a preoccupation with better. Nevill-Manning et al. (see Proc. Data Compression Conference, Los Alamitos, CA, p.244-253, 1994) presented an algorithm, since dubbed SEQUITUR, that presents both faces of the compression/learning coin. Its performance as a data compression scheme outstrips other dictionary schemes, and the structures that it learns from sequences as diverse as DNA and music are intuitively compelling. We present three new results that characterize SEQUITUR's computational and compression performance. First, we prove that SEQUITUR operates in time linear in n, the length of the input sequence, despite its ability to build a hierarchy as deep as log(n). Second, we show that a sequence can be compressed incrementally, improving on the non-incremental algorithm that was described by Nevill-Manning et al., and making on-line compression feasible. Third, we present an intriguing result that emerged during benchmarking; whereas PPMC outperforms SEQUITUR on most files in the Calgary corpus, SEQUITUR regains the lead when tested on multimegabyte sequences. We make some tentative conclusions about the underlying reasons for this phenomenon, and about the nature of current compression benchmarking.