Software—Practice & Experience
Text compression
Data compression in full-text retrieval systems
Journal of the American Society for Information Science
XMill: an efficient compressor for XML data
SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Information Retrieval: Computational and Theoretical Aspects
Information Retrieval: Computational and Theoretical Aspects
Modern Information Retrieval
DCC '02 Proceedings of the Data Compression Conference
Compressing XML with Multiplexed Hierarchical PPM Models
DCC '01 Proceedings of the Data Compression Conference
Merging Prediction by Partial Matching with Structural Contexts Model
DCC '04 Proceedings of the Conference on Data Compression
Word-based text compression using the Burrows-Wheeler transform
Information Processing and Management: an International Journal
Mapping words into codewords on PPM
SPIRE'06 Proceedings of the 13th international conference on String Processing and Information Retrieval
Natural Language Compression on Edge-Guided text preprocessing
Information Sciences: an International Journal
Hi-index | 0.00 |
We describe a novel compression technique for natural language text collections which takes advantage of the information provided by edges when a graph is used to model the text. This technique is called edge-guided compression. We propose an algorithm that allows the text to be transformed in agreement with the edge-guided technique in conjunction with the spaceless words transformation. The result of these transformations is a PPM-friendly byte-stream that has to be codified with a PPM family encoder. The comparison with state-of-art compressors shows that our proposal is a competitive choice for medium and large natural language text collections.