Syntax-directed compression of program files
Software—Practice & Experience
Proceedings of the ACM SIGPLAN 1997 conference on Programming language design and implementation
XMill: an efficient compressor for XML data
SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Split-stream dictionary program compression
PLDI '00 Proceedings of the ACM SIGPLAN 2000 conference on Programming language design and implementation
Prediction by Grammatical Match
DCC '00 Proceedings of the Conference on Data Compression
ICDE '00 Proceedings of the 16th International Conference on Data Engineering
Grammar-based codes: a new class of universal lossless source codes
IEEE Transactions on Information Theory
Code optimization for code compression
Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
Wireless SOAP: optimizations for mobile wireless web services
WWW '05 Special interest tracks and posters of the 14th international conference on World Wide Web
A case study on alternate representations of data structures in XML
Proceedings of the 2005 ACM symposium on Document engineering
Structuring labeled trees for optimal succinctness, and beyond
FOCS '05 Proceedings of the 46th Annual IEEE Symposium on Foundations of Computer Science
Revisiting dictionary-based compression: Research Articles
Software—Practice & Experience
Combining Structural and Textual Contexts for Compressing Semistructured Databases
ENC '05 Proceedings of the Sixth Mexican International Conference on Computer Science
Comparative Analysis of XML Compression Technologies
World Wide Web
Compressing and searching XML data via two zips
Proceedings of the 15th international conference on World Wide Web
A space efficient XML DOM parser
Data & Knowledge Engineering
Using structural contexts to compress semistructured text collections
Information Processing and Management: an International Journal
User modeling for personalized Web search with self-organizing map: Research Articles
Journal of the American Society for Information Science and Technology
XQueC: A query-conscious compressed XML database
ACM Transactions on Internet Technology (TOIT)
Benefits of alternate XML serialization formats in scientific computing
Proceedings of the 2007 workshop on Service-oriented computing performance: aspects, issues, and approaches
An analysis of XML compression efficiency
Proceedings of the 2007 workshop on Experimental computer science
The Effects of XML Compression on SOAP Performance
World Wide Web
EXEM: Efficient XML data exchange management for mobile applications
Information Systems Frontiers
An analysis of XML binary formats and compression
ecs'07 Experimental computer science on Experimental computer science
XML messaging for mobile devices: From requirements to implementation
Computer Networks: The International Journal of Computer and Telecommunications Networking
Compression of Annotated Nucleotide Sequences
IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Effective asymmetric XML compression
Software—Practice & Experience
An Effective GML Documents Compressor
IEICE - Transactions on Information and Systems
XML compression techniques: A survey and comparison
Journal of Computer and System Sciences
Efficient XML usage within wireless sensor networks
Proceedings of the 4th Annual International Conference on Wireless Internet
Proceedings of the 11th Annual conference on Genetic and evolutionary computation
Compressing and indexing labeled trees, with applications
Journal of the ACM (JACM)
XSym '09 Proceedings of the 6th International XML Database Symposium on Database and XML Technologies
XML Lossy Text Compression: A Preliminary Study
XSym '09 Proceedings of the 6th International XML Database Symposium on Database and XML Technologies
On prediction using variable order Markov models
Journal of Artificial Intelligence Research
A distributed geospatial infrastructure for Sensor Web
Computers & Geosciences
Visually Lossless HTML Compression
WISE '09 Proceedings of the 10th International Conference on Web Information Systems Engineering
Information and Software Technology
Connecting resource-constrained robots to knowledge-based systems
MIC '08 Proceedings of the 27th IASTED International Conference on Modelling, Identification and Control
MPEG video markup language and its applications to robust video transmission
Journal of Visual Communication and Image Representation
Compressing semistructured text databases
ECIR'03 Proceedings of the 25th European conference on IR research
Analysing the regularity of genomes using compression and expression simplification
EuroGP'07 Proceedings of the 10th European conference on Genetic programming
Edge-guided natural language text compression
SPIRE'07 Proceedings of the 14th international conference on String processing and information retrieval
Combining efficient XML compression with query processing
ADBIS'07 Proceedings of the 11th East European conference on Advances in databases and information systems
Compression of concatenated web pages using XBW
SOFSEM'08 Proceedings of the 34th conference on Current trends in theory and practice of computer science
A highly efficient XML compression scheme for the web
SOFSEM'08 Proceedings of the 34th conference on Current trends in theory and practice of computer science
XML data management and XPath evaluation in wireless sensor networks
Proceedings of the 7th International Conference on Advances in Mobile Computing and Multimedia
Partial retrieval of compressed semi-structured documents
International Journal of Computer Applications in Technology
CSC: supporting queries on compressed cached XML
ADC '09 Proceedings of the Twentieth Australasian Conference on Australasian Database - Volume 92
Searchable compression of office documents by XML schema subtraction
XSym'10 Proceedings of the 7th international XML database conference on Database and XML technologies
A GML documents stream compressor
DASFAA'11 Proceedings of the 16th international conference on Database systems for advanced applications
A query-friendly compression for GML documents
DASFAA'11 Proceedings of the 16th international conference on Database systems for advanced applications
Updates on grammar-compressed XML data
BNCOD'11 Proceedings of the 28th British national conference on Advances in databases
Document decomposition for XML compression: a heuristic approach
DASFAA'06 Proceedings of the 11th international conference on Database Systems for Advanced Applications
IBP: an index-based XML parser model
NPC'05 Proceedings of the 2005 IFIP international conference on Network and Parallel Computing
Mapping words into codewords on PPM
SPIRE'06 Proceedings of the 13th international conference on String Processing and Information Retrieval
A resource efficient hybrid data structure for twig queries
XSym'06 Proceedings of the 4th international conference on Database and XML Technologies
A quantitative summary of XML structures
ER'06 Proceedings of the 25th international conference on Conceptual Modeling
Employing dynamic object offloading as a design breakthrough for SOA adoption
ICSOC'11 Proceedings of the 9th international conference on Service-Oriented Computing
Compressing XML documents using recursive finite state automata
CIAA'05 Proceedings of the 10th international conference on Implementation and Application of Automata
Entity Notation: enabling knowledge representations for resource-constrained sensors
Personal and Ubiquitous Computing
A context-dependent XML compression approach to enable business applications on mobile devices
Euro-Par'07 Proceedings of the 13th international Euro-Par conference on Parallel Processing
Using XML-Based Multicasting to Improve Web Service Scalability
International Journal of Web Services Research
A spatial proximity based compression method for GML documents
WAIM'13 Proceedings of the 14th international conference on Web-Age Information Management
Fast multi-update operations on compressed XML data
BNCOD'13 Proceedings of the 29th British National conference on Big Data
XML tree structure compression using RePair
Information Systems
Hi-index | 0.00 |
Abstract: Extensible Markup Language (XML) is a standardized language that "describes a class of data objects called XML documents and partially describes the behavior of computer programs which process them." According to Bosak and Bray, XML is the "next big thing" after HTML. It is gaining momentum in many areas of the computer industry; for example, Microsoft has announced plans to base future software systems on XML. XML is not a specific markup language like HTML, but instead a meta-language for describing markup languages together with a strong standard for creating and parsing documents.Whatever XML's advantages, it has one glaring disadvantage: document size. XML's parent standard, Standardized General Markup Language (SGML), made many provisions for minimizing document size. These options made SGML complex and difficult to implement, and they were omitted from XML. Indeed, the XML standard explicitly states that markup terseness was not a design goal. Consequently, XML is easy to process but verbose. XML documents can be many times larger than equivalent non-standardized text or binary formats, ev en if compressed. There is growing concern in the XML community that inefficiency arising from document size will hinder adoption and use of XML.From an information-theoretic point of view, this situation is vexing because two sources carrying the same messages should have the same entropy and so compress to about the same size using any universal compressor. The problem is that XML documents may have nonlocal redundancy arising from XML's tree structure, which is difficult for text compressors to discover. Conversely, XML-conscious compression techniques might be able to compress XML documents as well as or better than other representations. Fortuitously, XML's simple design makes testing this hypothesis easier than in previous structured-data compression approaches such assyntax-based compression of program files or machine code compression.Liefke and Suciu describe XMILL, an XML compressor that transforms documents to expose redundancy, then applies standard text compressors. XMILL combined with gzip compresses XML data about 10% better than gzip on equivalent non-XML forms; further improvement (up to 50%) is possible with user assistance in the form of complex command-line parameters. This work shows that XML-conscious compression can do better than text compression alone. However, XMILL's base transformation has several drawbacks: namely, it precludes incremental (online) processing of compressed documents, it actually hinders compressors other than gzip, and it requires user assistance to achieve the best compression.Figure 1: XML example In this paper, we will describe alternative approaches to XML compression that illustrate other tradeoffs between speed and effectiveness. We describe experiments using several text compressors and XMILL to compress a variety of XML documents. Using these as a benchmark, we describe our two main results: an online binary encoding for XML called Encoded SAX(ESAX) that compresses better and faster than existing methods; and an online, adaptive, XML-conscious encoding based on Prediction by Partial Match (PPM) called Multiplexed Hierarchical Modeling (MHM) that compresses up to 35% better than any existing method but is fairly slow. First, of course, we need to describe XML in more detail.