The design and analysis of efficient lossless data compression systems
The design and analysis of efficient lossless data compression systems
Arithmetic coding for data compression
Communications of the ACM
DCC '02 Proceedings of the Data Compression Conference
Security log management
Hi-index | 0.00 |
Maillogs contain important information about mail which has been sent or received. This information can be used for statistical purposes, to help prevent viruses or to help prevent SPAM. In order to satisfy regulations and follow good security practices, maillogs need to be monitored and archived. Since there is a large quantity of data, some form of data reduction is necessary. Data compression programs such as gzip and bzip2 are commonly used to reduce the quantity of data. Text preprocessing can be used to aid the compression of English text files. This paper evaluates whether text preprocessing, particularly word replacement, can be used to improve the compression of maillogs. It presents an algorithm for constructing a dictionary for word replacement and provides the results of experiments conducted using the ppmd, gzip, bzip2 and 7zip programs. These tests show that text preprocessing improves data compression on maillogs. Improvements of up to 56 percent in compression time and up to 32 percent in compression ratio are achieved. It also shows that a dictionary may be generated and used on other maillogs to yield reductions within half a percent of the results achieved for the maillog used to generate the dictionary.