Software—Practice & Experience
Text compression
Options in physical database design
ACM SIGMOD Record
Arithmetic coding for data compression
Communications of the ACM
XMill: an efficient compressor for XML data
SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Compression and Coding Algorithms
Compression and Coding Algorithms
Data Compression Support in Databases
VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
XPRESS: a queriable compression for XML data
Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Compressing XML with Multiplexed Hierarchical PPM Models
DCC '01 Proceedings of the Data Compression Conference
XGRIND: A Query-Friendly XML Compressor
ICDE '02 Proceedings of the 18th International Conference on Data Engineering
Fast Searching over Compressed Text using A New Coding Technique: Tagged Sub-optimal Code (TSC)
DCC '04 Proceedings of the Conference on Data Compression
Lempel-Ziv Compression of Structured Text
DCC '04 Proceedings of the Conference on Data Compression
Using structural contexts to compress semistructured text collections
Information Processing and Management: an International Journal
User modeling for personalized Web search with self-organizing map: Research Articles
Journal of the American Society for Information Science and Technology
Searching a pattern in compressed DNA sequences
International Journal of Bioinformatics Research and Applications
Hi-index | 0.00 |
We describe a compression model called tri-structural contexts model (TSCM), for semi-structured documents. The intention is that separation of the start tag, the attribute name/attribute value and textual words may reduce the entropy. We also combine the attributes with their values and use a separate container for them. We mainly focus on semi-static models, and test our idea using a word-based tagged code. This code allows random access and partial decompression of the compressed collection. The compression time is found to be better than scmhuff and decompression time is also observed much less than scmhuff and xmlppm. The shorter time for partial decompression emphasises the use of TSC model to keep the semi-structured document compressed all the time. The algorithm and proposed model are useful in information retrieval systems.