Compressing and searching XML data via two zips

Authors:
P. Ferragina;F. Luccio;G. Manzini;S. Muthukrishnan
Affiliations:
Univ. Pisa;Univ. Pisa;Univ. Piemonte Orientale;Rutgers Univ.
Venue:
Proceedings of the 15th international conference on World Wide Web
Year:
2006

Citing 23
Cited 20

XMill: an efficient compressor for XML data

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Fast and flexible word searching on compressed text

ACM Transactions on Information Systems (TOIS)
An experimental study of a compressed index

Information Sciences: an International Journal - Dictionary based compression
DataGuides: Enabling Query Formulation and Optimization in Semistructured Databases

VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
ViST: a dynamic index method for querying XML data by tree structures

Proceedings of the 2003 ACM SIGMOD international conference on Management of data
XPRESS: a queriable compression for XML data

Proceedings of the 2003 ACM SIGMOD international conference on Management of data
PPM: One Step to Practicality

DCC '02 Proceedings of the Data Compression Conference
Compressing XML with Multiplexed Hierarchical PPM Models

DCC '01 Proceedings of the Data Compression Conference
XGRIND: A Query-Friendly XML Compressor

ICDE '02 Proceedings of the 18th International Conference on Data Engineering
Merging Prediction by Partial Matching with Structural Contexts Model

DCC '04 Proceedings of the Conference on Data Compression
Lempel-Ziv Compression of Structured Text

DCC '04 Proceedings of the Conference on Data Compression
PRIX: Indexing And Querying XML Using Prüfer Sequences

ICDE '04 Proceedings of the 20th International Conference on Data Engineering
Succinct ordinal trees with level-ancestor queries

SODA '04 Proceedings of the fifteenth annual ACM-SIAM symposium on Discrete algorithms
On the integration of structure indexes and inverted lists

SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Will Binary XML Speed Network Traffic?

Computer
On boosting holism in XML twig pattern matching using structural indexing techniques

Proceedings of the 2005 ACM SIGMOD international conference on Management of data
Indexing compressed text

Journal of the ACM (JACM)
Efficient processing of XML path queries using the disk-based F&B Index

VLDB '05 Proceedings of the 31st international conference on Very large data bases
XML Document Indexes: A Classification

IEEE Internet Computing
Structuring labeled trees for optimal succinctness, and beyond

FOCS '05 Proceedings of the 46th Annual IEEE Symposium on Foundations of Computer Science
Representing Trees of Higher Degree

Algorithmica
Rank/select operations on large alphabets: a tool for text indexing

SODA '06 Proceedings of the seventeenth annual ACM-SIAM symposium on Discrete algorithm
XQueC: pushing queries to compressed XML data

VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29

Querying and maintaining a compact XML storage

Proceedings of the 16th international conference on World Wide Web
Engineering succinct DOM

EDBT '08 Proceedings of the 11th international conference on Extending database technology: Advances in database technology
On searching compressed string collections cache-obliviously

Proceedings of the twenty-seventh ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Effective asymmetric XML compression

Software—Practice & Experience
XML Storage and Processing on Mobile Devices

WISE '08 Proceedings of the 9th international conference on Web Information Systems Engineering
Compressed text indexes: From theory to practice

Journal of Experimental Algorithmics (JEA)
2LP: A double-lazy XML parser

Information Systems
XML compression techniques: A survey and comparison

Journal of Computer and System Sciences
Compressing and indexing labeled trees, with applications

Journal of the ACM (JACM)
Efficient indexing of versioned document sequences

ECIR'07 Proceedings of the 29th European conference on IR research
Combining efficient XML compression with query processing

ADBIS'07 Proceedings of the 11th East European conference on Advances in databases and information systems
A highly efficient XML compression scheme for the web

SOFSEM'08 Proceedings of the 34th conference on Current trends in theory and practice of computer science
Data structures: time, I/Os, entropy, joules!

ESA'10 Proceedings of the 18th annual European conference on Algorithms: Part II
Spatio-temporal range searching over compressed kinetic sensor data

ESA'10 Proceedings of the 18th annual European conference on Algorithms: Part I
Statistical encoding of succinct data structures

CPM'06 Proceedings of the 17th Annual conference on Combinatorial Pattern Matching
Searching web data: An entity retrieval and high-performance indexing model

Web Semantics: Science, Services and Agents on the World Wide Web
A resource efficient hybrid data structure for twig queries

XSym'06 Proceedings of the 4th international conference on Database and XML Technologies
A compact XML storage scheme supporting efficient path querying

APWeb'12 Proceedings of the 14th Asia-Pacific international conference on Web Technologies and Applications
Full-text search on multi-byte encoded documents

Proceedings of the 2012 ACM symposium on Document engineering
Schema Independent XML Compressor

International Journal of Information Retrieval Research

Quantified Score

Hi-index	0.00

Visualization

Abstract

XML is fast becoming the standard format to store, exchange and publish over the web, and is getting embedded in applications. Two challenges in handling XML are its size (the XML representation of a document is significantly larger than its native state) and the complexity of its search (XML search involves path and content searches on labeled tree structures). We address the basic problems of compression, navigation and searching of XML documents. In particular, we adopt recently proposed theoretical algorithms [11] for succinct tree representations to design and implement a compressed index for XML, called XBZIPiNDEX, in which the XML document is maintained in a highly compressed format, and both navigation and searching can be done uncompressing only a tiny fraction of the data. This solution relies on compressing and indexing two arrays derived from the XML data. With detailed experiments we compare this with other compressed XML indexing and searching engines to show that XBZIPiNDEX has compression ratio up to 35% better than the ones achievable by those other tools, and its time performance on some path and content search operations is order of magnitudes faster: few milliseconds over hundreds of MBs of XML files versus tens of seconds, on standard XML data sources.