Document listing for queries with excluded pattern

Authors:
Wing-Kai Hon;Rahul Shah;Sharma V. Thankachan;Jeffrey Scott Vitter
Affiliations:
National Tsing Hua University, Taiwan;Louisiana State University;Louisiana State University;The University of Kansas
Venue:
CPM'12 Proceedings of the 23rd Annual conference on Combinatorial Pattern Matching
Year:
2012

Citing 28
Cited 1

Suffix arrays: a new method for on-line string searches

SIAM Journal on Computing
Efficient algorithms for document retrieval problems

SODA '02 Proceedings of the thirteenth annual ACM-SIAM symposium on Discrete algorithms
High-order entropy-compressed text indexes

SODA '03 Proceedings of the fourteenth annual ACM-SIAM symposium on Discrete algorithms
The LCA Problem Revisited

LATIN '00 Proceedings of the 4th Latin American Symposium on Theoretical Informatics
Augmenting Suffix Trees, with Applications

ESA '98 Proceedings of the 6th Annual European Symposium on Algorithms
Two-dimensional substring indexing

Journal of Computer and System Sciences - Special issu on PODS 2001
Rank/select operations on large alphabets: a tool for text indexing

SODA '06 Proceedings of the seventeenth annual ACM-SIAM symposium on Discrete algorithm
Succinct data structures for flexible text retrieval systems

Journal of Discrete Algorithms
Ultra-succinct representation of ordered trees

SODA '07 Proceedings of the eighteenth annual ACM-SIAM symposium on Discrete algorithms
Succinct indexable dictionaries with applications to encoding k-ary trees, prefix sums and multisets

ACM Transactions on Algorithms (TALG)
Geometric Burrows-Wheeler Transform: Linking Range Searching and Text Indexing

DCC '08 Proceedings of the Data Compression Conference
Space-Efficient Algorithms for Document Retrieval

CPM '07 Proceedings of the 18th annual symposium on Combinatorial Pattern Matching
Linear pattern matching algorithms

SWAT '73 Proceedings of the 14th Annual Symposium on Switching and Automata Theory (swat 1973)
The myriad virtues of Wavelet Trees

Information and Computation
Space-Efficient Framework for Top-k String Retrieval Problems

FOCS '09 Proceedings of the 2009 50th Annual IEEE Symposium on Foundations of Computer Science
Note: Fast set intersection and two-patterns matching

Theoretical Computer Science
Compression, indexing, and retrieval for massive string data

CPM'10 Proceedings of the 21st annual conference on Combinatorial pattern matching
Efficient index for retrieving top-k most frequent documents

Journal of Discrete Algorithms
Top-k ranked document search in general text databases

ESA'10 Proceedings of the 18th annual European conference on Algorithms: Part II
String retrieval for multi-pattern queries

SPIRE'10 Proceedings of the 17th international conference on String processing and information retrieval
Colored range queries and document retrieval

SPIRE'10 Proceedings of the 17th international conference on String processing and information retrieval
Dual-sorted inverted lists

SPIRE'10 Proceedings of the 17th international conference on String processing and information retrieval
Inverted indexes for phrases and strings

Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
Improved compressed indexes for full-text document retrieval

SPIRE'11 Proceedings of the 18th international conference on String processing and information retrieval
Top-k document retrieval in optimal time and linear space

Proceedings of the twenty-third annual ACM-SIAM symposium on Discrete Algorithms
Top-K color queries for document retrieval

Proceedings of the twenty-second annual ACM-SIAM symposium on Discrete Algorithms
Forbidden patterns

LATIN'12 Proceedings of the 10th Latin American international conference on Theoretical Informatics
Towards an optimal space-and-query-time index for top-k document retrieval

CPM'12 Proceedings of the 23rd Annual conference on Combinatorial Pattern Matching

Spaces, Trees, and Colors: The algorithmic landscape of document retrieval on sequences

ACM Computing Surveys (CSUR)

Quantified Score

Hi-index	0.00

Visualization

Abstract

Let $\mathcal D$ ={d1,d2,...,dD} be a given collection of D string documents of total length n. We consider the problem of indexing $\mathcal D$ such that, whenever two patterns P+ and P− comes as an online query, we can list all those documents containing P+ but not P−. Let t represent the number of such documents. An index proposed by Fischer et al. (LATIN, 2012) can answer this query in $O(|P^+|+|P^-|+t+\sqrt{n})$ time. However, its space requirement is O(n3/2) bits. We propose the first linear-space index for this problem with a worst case query time of $O(|P^+|+|P^-|+\sqrt{n}\log \log n+\sqrt{nt}\log^{2.5} n)$.