Index compression vs. retrieval time of inverted files for XML documents

Authors:
Norbert Fuhr;Norbert Gövert
Affiliations:
University of Dortmund, Germany;University of Dortmund, Germany
Venue:
Proceedings of the eleventh international conference on Information and knowledge management
Year:
2002

Citing 4
Cited 2

Efficient decoding of prefix codes

Communications of the ACM
Self-indexing inverted files for fast text retrieval

ACM Transactions on Information Systems (TOIS)
Managing gigabytes (2nd ed.): compressing and indexing documents and images

Managing gigabytes (2nd ed.): compressing and indexing documents and images
XIRQL: a query language for information retrieval in XML documents

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval

Searching structured documents

Information Processing and Management: an International Journal
A flexible object-oriented system for teaching and learning structured IR

TLIR'07 Proceedings of the First international conference on Teaching and Learning of Information Retrieval

Quantified Score

Hi-index	0.00

Visualization

Abstract

Query languages for retrieval of XML documents allow for conditions referring both to the content and the structure of documents. In this paper, we investigate two different approaches for reducing index space of inverted files for XML documents. First, we consider methods for compressing index entries. Second, we develop the new XS tree data structure which contains the structural description of a document in a rather compact form, such that these descriptions can be kept in main memory. Experimental results on two large XML document collections show that very high compression rates for indexes can be achieved, but any compression increases retrieval time. On the other hand, highly compressed indexes may be feasible for applications where storage is limited, such as in PDAs or E-book devices.