A locally adaptive data compression scheme
Communications of the ACM
Text compression
XMill: an efficient compressor for XML data
SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Word-Based Compression Methods and Indexing for Text Retrieval Systems
ADBIS '99 Proceedings of the Third East European Conference on Advances in Databases and Information Systems
Compressing XML with Multiplexed Hierarchical PPM Models
DCC '01 Proceedings of the Data Compression Conference
Compressing semistructured text databases
ECIR'03 Proceedings of the 25th European conference on IR research
XSym '09 Proceedings of the 6th International XML Database Symposium on Database and XML Technologies
Hi-index | 0.00 |
We describe a compression technique for semistructured documents, called SCMPPM, which combines the Prediction by Partial Matching technique with Structural Contexts Model (SCM) technique. SCMPPM takes advantage of the context information usually implicit in the structure of the text. The idea is to use a separate PPM model to compress the text that lies inside each different structure type (e.g., different XML tag). The intuition is that the distribution of the texts that belong to a given structure type should be similar, and different from that of other structure types. This should allow PPM to make better predictions. We test our idea against plain PPM modelling, as well as against other structure-aware techniques. Results show that the new compression method obtains significant improvements in compression ratios.