A locally adaptive data compression scheme
Communications of the ACM
Software—Practice & Experience
Text compression
Information retrieval
A new approach to text searching
Communications of the ACM
Fast text searching: allowing errors
Communications of the ACM
Compression of indexes with full positional information in very large text databases
SIGIR '93 Proceedings of the 16th annual international ACM SIGIR conference on Research and development in information retrieval
Overview of the second text retrieval conference (TREC-2)
TREC-2 Proceedings of the second conference on Text retrieval conference
In situ generation of compressed inverted files
Journal of the American Society for Information Science
Fast searching on compressed text allowing errors
Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Block addressing indices for approximate text retrieval
Journal of the American Society for Information Science - Special topic issue: When museum informatics meets the World Wide Web
Integrating contents and structure in text retrieval
ACM SIGMOD Record
Information Retrieval: Computational and Theoretical Aspects
Information Retrieval: Computational and Theoretical Aspects
Modern Information Retrieval
Text Compression for Dynamic Document Databases
IEEE Transactions on Knowledge and Data Engineering
Fast Algorithms for Two Dimensional and Multiple Pattern Matching (Preliminary Version)
SWAT '90 Proceedings of the 2nd Scandinavian Workshop on Algorithm Theory
Fast Incremental Indexing for Full-Text Information Retrieval
VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Scalable Text Retrieval for Large Digital Libraries
ECDL '97 Proceedings of the First European Conference on Research and Advanced Technology for Digital Libraries
Linear Time Sorting of Skewed Distributions
SPIRE '99 Proceedings of the String Processing and Information Retrieval Symposium & International Workshop on Groupware
GLIMPSE: a tool to search through entire file systems
WTEC'94 Proceedings of the USENIX Winter 1994 Technical Conference on USENIX Winter 1994 Technical Conference
WebGlimpse: combining browsing and searching
ATEC '97 Proceedings of the annual conference on USENIX Annual Technical Conference
Fast and flexible word searching on compressed text
ACM Transactions on Information Systems (TOIS)
A guided tour to approximate string matching
ACM Computing Surveys (CSUR)
Rank-preserving two-level caching for scalable search engines
Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Compression of inverted indexes For fast query evaluation
SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
Matchsimile: a flexible approximate matching tool for searching proper names
Journal of the American Society for Information Science and Technology
High-order entropy-compressed text indexes
SODA '03 Proceedings of the fourteenth annual ACM-SIAM symposium on Discrete algorithms
Indexing Text Using the Ziv-Lempel Trie
SPIRE 2002 Proceedings of the 9th International Symposium on String Processing and Information Retrieval
When indexing equals compression: experiments with compressing suffix arrays and applications
SODA '04 Proceedings of the fifteenth annual ACM-SIAM symposium on Discrete algorithms
Indexing text using the Ziv-Lempel trie
Journal of Discrete Algorithms - SPIRE 2002
Improving Web search efficiency via a locality based static pruning method
WWW '05 Proceedings of the 14th international conference on World Wide Web
Comparing inverted files and signature files for searching a large lexicon
Information Processing and Management: an International Journal - Special issue: Cross-language information retrieval
Efficiently decodable and searchable natural language adaptive compression
Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
LZgrep: a Boyer–Moore string matching tool for Ziv–Lempel compressed text: Research Articles
Software—Practice & Experience
Inverted files for text search engines
ACM Computing Surveys (CSUR)
Efficient query processing in geographic web search engines
Proceedings of the 2006 ACM SIGMOD international conference on Management of data
ACM Computing Surveys (CSUR)
Using structural contexts to compress semistructured text collections
Information Processing and Management: an International Journal
User modeling for personalized Web search with self-organizing map: Research Articles
Journal of the American Society for Information Science and Technology
Efficient in-memory extensible inverted file
Information Systems
Locality-Based pruning methods for web search
ACM Transactions on Information Systems (TOIS)
Performance of compressed inverted list caching in search engines
Proceedings of the 17th international conference on World Wide Web
Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Implementing the LZ-index: Theory versus practice
Journal of Experimental Algorithmics (JEA)
Compressed text indexes: From theory to practice
Journal of Experimental Algorithmics (JEA)
Self-indexing Natural Language
SPIRE '08 Proceedings of the 15th International Symposium on String Processing and Information Retrieval
RLH: Bitmap compression technique based on run-length and Huffman encoding
Information Systems
An efficient compression code for text databases
ECIR'03 Proceedings of the 25th European conference on IR research
Compressing semistructured text databases
ECIR'03 Proceedings of the 25th European conference on IR research
Improving semistatic compression via pair-based coding
PSI'06 Proceedings of the 6th international Andrei Ershov memorial conference on Perspectives of systems informatics
Dynamic lightweight text compression
ACM Transactions on Information Systems (TOIS)
SPIRE'10 Proceedings of the 17th international conference on String processing and information retrieval
Space-efficient construction of Lempel-Ziv compressed text indexes
Information and Computation
Scalable, statistical storage allocation for extensible inverted file construction
Journal of Systems and Software
Word-based self-indexes for natural language text
ACM Transactions on Information Systems (TOIS)
Inverted files versus suffix arrays for locating patterns in primary memory
SPIRE'06 Proceedings of the 13th international conference on String Processing and Information Retrieval
Compressing dynamic text collections via phrase-based coding
ECDL'05 Proceedings of the 9th European conference on Research and Advanced Technology for Digital Libraries
New algorithms on wavelet trees and applications to information retrieval
Theoretical Computer Science
Exploiting SIMD instructions in current processors to improve classical string algorithms
ADBIS'12 Proceedings of the 16th East European conference on Advances in Databases and Information Systems
Implicit indexing of natural language text by reorganizing bytecodes
Information Retrieval
Hi-index | 0.00 |
Inverted index compression, block addressing and sequential search on compressed text are three techniques that have been separately developed for efficient, low-overhead text retrieval. Modern text compression techniques can reduce the text to less than 30% of its size and allow searching it directly and faster than the uncompressed text. Inverted index compression obtains significant reduction of its original size at the same processing speed. Block addressing makes the inverted lists point to text blocks instead of exact positions and pay the reduction in space with some sequential text scanning.In this work we combine the three ideas in a single scheme. We present a compressed inverted file that indexes compressed text and uses block addressing. We consider different techniques to compress the index and study their performance with respect to the block size. We compare the index against three separate techniques for varying block sizes, showing that our index is superior to each isolated approach. For instance, with just 4% of extra space overhead the index has to scan less than 12% of the text for exact searches and about 20% allowing one error in the matches.