Data compression using static Huffman code-decode tables
Communications of the ACM
Journal of Algorithms
ACM Computing Surveys (CSUR) - Annals of discrete mathematics, 24
Adding compression to a full-text retrieval system
Software—Practice & Experience
String matching in Lempel-Ziv compressed strings
STOC '95 Proceedings of the twenty-seventh annual ACM symposium on Theory of computing
Let sleeping files lie: pattern matching in Z-compressed files
Journal of Computer and System Sciences
Data structures and algorithms for nearest neighbor search in general metric spaces
SODA '93 Proceedings of the fourth annual ACM-SIAM Symposium on Discrete algorithms
Economical encoding of commas between strings
Communications of the ACM
Common phrases and minimum-space text storage
Communications of the ACM
VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
A Text Compression Scheme That Allows Fast Searching Directly in the Compressed File
CPM '94 Proceedings of the 5th Annual Symposium on Combinatorial Pattern Matching
Building a complete inverted file for a set of text files in linear time
STOC '84 Proceedings of the sixteenth annual ACM symposium on Theory of computing
A Search Engine for Indian Languages
EC-WEB '00 Proceedings of the First International Conference on Electronic Commerce and Web Technologies
A web search engine model based on index-query bit-level compression
Proceedings of the 1st International Conference on Intelligent Semantic Web-Services and Applications
Development of a Novel Compressed Index-Query Web Search Engine Model
International Journal of Information Technology and Web Engineering
Hi-index | 0.00 |
Keyword based search engines are the basic building block of text retrieval systems. Higher level systems like content sensitive search engines and knowledge-based systems still rely on keyword search as the underlying text retrieval mechanism. With the explosive growth in content, Internet and Intranet information repositories require efficient mechanisms to store as well as index data. In this paper we discuss the implementation of the Shrink and Search Engine (SASE) framework which unites text compression and indexing to maximize keyword search performance while reducing storage cost. SASE features the novel capability of being able to directly search through compressed text without explicit decompression. The implementation includes a search server architecture, which can be accessed from a Java front-end to perform keyword search on the Internet. The performance results show that the compression efficiency of SASE is within 7-17% of GZIP one of the best lossless compression schemes. The sum of the compressed file size and the inverted indices is only between 55-76% of the original database while the search performance is comparable to a fully inverted index. The framework allows a flexible trade-off between search performance and storage requirements for the search indices.