Compressed data structures with relevance

Authors:
Jeffrey Scott Vitter
Affiliations:
The University of Kansas, Lawrence, KS, USA
Venue:
Proceedings of the 21st ACM international conference on Information and knowledge management
Year:
2012

Citing 16
Cited 0

Compressed suffix arrays and suffix trees with applications to text indexing and string matching (extended abstract)

STOC '00 Proceedings of the thirty-second annual ACM symposium on Theory of computing
Efficient algorithms for document retrieval problems

SODA '02 Proceedings of the thirteenth annual ACM-SIAM symposium on Discrete algorithms
High-order entropy-compressed text indexes

SODA '03 Proceedings of the fourteenth annual ACM-SIAM symposium on Discrete algorithms
Compressed Text Databases with Efficient Query Algorithms Based on the Compressed Suffix Array

ISAAC '00 Proceedings of the 11th International Conference on Algorithms and Computation
Opportunistic data structures with applications

FOCS '00 Proceedings of the 41st Annual Symposium on Foundations of Computer Science
New text indexing functionalities of the compressed suffix arrays

Journal of Algorithms
Indexing compressed text

Journal of the ACM (JACM)
Compressed Suffix Arrays and Suffix Trees with Applications to Text Indexing and String Matching

SIAM Journal on Computing
When indexing equals compression: Experiments with compressing suffix arrays and applications

ACM Transactions on Algorithms (TALG)
Compressed full-text indexes

ACM Computing Surveys (CSUR)
Compressed Suffix Trees with Full Functionality

Theory of Computing Systems
Algorithms and data structures for external memory

Foundations and Trends® in Theoretical Computer Science
Space-Efficient Framework for Top-k String Retrieval Problems

FOCS '09 Proceedings of the 2009 50th Annual IEEE Symposium on Foundations of Computer Science
Implicit compression boosting with applications to self-indexing

SPIRE'07 Proceedings of the 14th international conference on String processing and information retrieval
Compression, indexing, and retrieval for massive string data

CPM'10 Proceedings of the 21st annual conference on Combinatorial pattern matching
Top-k document retrieval in optimal time and linear space

Proceedings of the twenty-third annual ACM-SIAM symposium on Discrete Algorithms

Quantified Score

Hi-index	0.00

Visualization

Abstract

We describe recent breakthroughs in the field of compressed data structures, in which the data structure is stored in a compressed representation that still allows fast answers to queries. We focus in particular on compressed data structures to support the important application of pattern matching on massive document collections. Given an arbitrary query pattern in textual form, the job of the data structure is to report all the locations where the pattern appears. Another variant is to report all the documents that contain at least one instance of the pattern. We are particularly interested in reporting only the most relevant documents, using a variety of notions of relevance. We discuss recently developed techniques that support fast search in these contexts as well as under additional positional and temporal constraints.