Improved hierarchical bit-vector compression in document retrieval systems
Proceedings of the 9th annual international ACM SIGIR conference on Research and development in information retrieval
KEDMA—Linguistic Tools for Retrieval Systems
Journal of the ACM (JACM)
A fast string searching algorithm
Communications of the ACM
Secondary key retrieval using an IBM 7090-1301 system
Communications of the ACM
Information Retrieval: Computational and Theoretical Aspects
Information Retrieval: Computational and Theoretical Aspects
Combinatorial Algorithms: Theory and Practice
Combinatorial Algorithms: Theory and Practice
Storing text retrieval systems on CD-ROM: compression and encryption considerations
ACM Transactions on Information Systems (TOIS)
Storing text retrieval systems on CD-ROM: compression and encryption considerations
SIGIR '89 Proceedings of the 12th annual international ACM SIGIR conference on Research and development in information retrieval
Construction of optimal graphs for bit-vector compression
SIGIR '90 Proceedings of the 13th annual international ACM SIGIR conference on Research and development in information retrieval
Detecting content-bearing words by serial clustering—extended abstract
SIGIR '95 Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval
Self-indexing inverted files for fast text retrieval
ACM Transactions on Information Systems (TOIS)
Simple Bayesian Model for Bitmap Compression
Information Retrieval
Morphological Disambiguation for Hebrew Search Systems
NGIT '99 Proceedings of the 4th International Workshop on Next Generation Information Technologies and Systems
Inverted files for text search engines
ACM Computing Surveys (CSUR)
Processing queries with metrical constraints in XML-based IR systems
Journal of the American Society for Information Science and Technology
On the use of negation in Boolean IR queries
Information Processing and Management: an International Journal
Statistical thesaurus construction for a morphologically rich language
SemEval '12 Proceedings of the First Joint Conference on Lexical and Computational Semantics - Volume 1: Proceedings of the main conference and the shared task, and Volume 2: Proceedings of the Sixth International Workshop on Semantic Evaluation
Hi-index | 0.00 |
In static full-text retrieval systems, which accommodate metrical as well as Boolean operators, the traditional approach to query processing uses a “concordance”, from which large sets of coordinates are retrieved and then merged and/or collated. Alternatively, in a system with l documents, the concordance can be replaced by a set of bit-maps of fixed length l, which are constructed for every different word of the database and serve as occurrence maps. We propose to combine the concordance and bit-map approaches, and show how this can speed up the processing of queries: fast ANDing and ORing of the maps in a preprocessing stage, lead to large I/O savings in collating coordinates of keywords needed to satisfy the metrical and Boolean constraints. Moreover, the bit-maps give partial information on the distribution of the coordinates of the keywords, which can be used when queries must be processed by stages, due to their complexity and the sizes of the involved sets of coordinates. The new techniques are partially implemented at the Responsa Retrieval Project.