Software—Practice & Experience
Text compression
Fast and flexible word searching on compressed text
ACM Transactions on Information Systems (TOIS)
Modern Information Retrieval
Flexible pattern matching in strings: practical on-line search algorithms for texts and biological sequences
Adding Compression to Block Addressing Inverted Indexes
Information Retrieval
On Lower Bounds for the Redundancy of Optimal Codes
Designs, Codes and Cryptography
On the implementation of minimum-redundancy prefix codes
DCC '96 Proceedings of the Conference on Data Compression
Efficiently decodable and searchable natural language adaptive compression
Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
A general compression algorithm that supports fast searching
Information Processing Letters
New technique for data compression
SEPADS'05 Proceedings of the 4th WSEAS International Conference on Software Engineering, Parallel & Distributed Systems
Improved Variable-to-Fixed Length Codes
SPIRE '08 Proceedings of the 15th International Symposium on String Processing and Information Retrieval
Rank and Select for Succinct Data Structures
Electronic Notes in Theoretical Computer Science (ENTCS)
Simple Random Access Compression
Fundamenta Informaticae
Fast and Flexible Compression for Web Search Engines
Electronic Notes in Theoretical Computer Science (ENTCS)
The strategy design of compression and transmission on cGML spatial data and its application in LBS
WiCOM'09 Proceedings of the 5th International Conference on Wireless communications, networking and mobile computing
Improving semistatic compression via pair-based coding
PSI'06 Proceedings of the 6th international Andrei Ershov memorial conference on Perspectives of systems informatics
Simple compression code supporting random access and fast string matching
WEA'07 Proceedings of the 6th international conference on Experimental algorithms
Training parse trees for efficient VF coding
SPIRE'10 Proceedings of the 17th international conference on String processing and information retrieval
Information Processing and Management: an International Journal
Phrase-Based pattern matching in compressed text
SPIRE'06 Proceedings of the 13th international conference on String Processing and Information Retrieval
Efficient compression of text attributes of data warehouse dimensions
DaWaK'05 Proceedings of the 7th international conference on Data Warehousing and Knowledge Discovery
Compressing dynamic text collections via phrase-based coding
ECDL'05 Proceedings of the 9th European conference on Research and Advanced Technology for Digital Libraries
Enhanced byte codes with restricted prefix properties
SPIRE'05 Proceedings of the 12th international conference on String Processing and Information Retrieval
Simple Random Access Compression
Fundamenta Informaticae
ODC: Frame for definition of Dense codes
European Journal of Combinatorics
Practical fixed length Lempel-Ziv coding
Discrete Applied Mathematics
Hi-index | 0.00 |
We present a new compression format for natural language texts, allowing both exact and approximate search without decompression. This new code -called End-Tagged Dense Code- has some advantages with respect to other compression techniques with similar features such as the Tagged Huffman Code of [Moura et al., ACM TOIS 2000]. Our compression method obtains (i) better compression ratios, (ii) a simpler vocabulary representation, and (iii) a simpler and faster encoding. At the same time, it retains the most interesting features of the method based on the Tagged Huffman Code, i.e., exact search for words and phrases directly on the compressed text using any known sequential pattern matching algorithm, efficient word-based approximate and extended searches without any decoding, and efficient decompression of arbitrary portions of the text. As a side effect, our analytical results give new upper and lower bounds for the redundancy of d-ary Huffman codes.