Information Processing and Management: an International Journal
Data compression for a source with Markov characteristics
The Computer Journal
Data compression using dynamic Markov modelling
The Computer Journal
Compression of concordances in full-text retrieval systems
SIGIR '88 Proceedings of the 11th annual international ACM SIGIR conference on Research and development in information retrieval
Data compression with finite windows
Communications of the ACM
Storing text retrieval systems on CD-ROM: compression and encryption considerations
ACM Transactions on Information Systems (TOIS)
ACM Computing Surveys (CSUR)
A systematic approach to compressing a full-text retrieval system
Information Processing and Management: an International Journal - Special issue on data compression for images and texts
Compression of indexes with full positional information in very large text databases
SIGIR '93 Proceedings of the 16th annual international ACM SIGIR conference on Research and development in information retrieval
Document and passage retrieval based on hidden Markov models
SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
Improved hierarchical bit-vector compression in document retrieval systems
Proceedings of the 9th annual international ACM SIGIR conference on Research and development in information retrieval
The art of computer programming, volume 2 (3rd ed.): seminumerical algorithms
The art of computer programming, volume 2 (3rd ed.): seminumerical algorithms
Managing gigabytes (2nd ed.): compressing and indexing documents and images
Managing gigabytes (2nd ed.): compressing and indexing documents and images
Simple Bayesian Model for Bitmap Compression
Information Retrieval
Binary Interpolative Coding for Effective Index Compression
Information Retrieval
Compressing term positions in web indexes
Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
Sorting out the document identifier assignment problem
ECIR'07 Proceedings of the 29th European conference on IR research
VSEncoding: efficient coding and fast decoding of integer lists via dynamic programming
CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
Engineering basic algorithms of an in-memory text search engine
ACM Transactions on Information Systems (TOIS)
Hi-index | 0.00 |
An earlier paper developed a procedure for compressing concordances, assuming that all alements occurred independently. The models introduced in that paper are extended here to take the possiblity of clustering into account. The concordance is conceptualized as a set of bitmaps, in which the bit locations reporesent documents, and the one-bits represent the occurrence of given terms. Hidden Markov Models (HMM's) are used to describe the clustering of the one-bits. However, for computational reasons, the HMM is approximated by traditional Markov models. A set of criteria is developed to constrain the allowable set of n-state models, and a full inventory is given for n ≤ 4. Graph-theoretic reduction and complementation operations are defined among the various models and are used to provide a structure relating the models studied. Finally, the new methods were tested on the concordances of the English Bible and of two of the world's largest full-text retrieval systems: the Tre´sor de la Langue Franc¸aise and the Responsa Project.