Document Image Coding for Processing and Retrieval

Authors:
Omid E. Kia;David S. Doermann
Affiliations:
National Institute of Standards and Technology, Mathematical and Computational Sciences Division, Building 820, Room 365, Gaithersburg, MD 20899;Language and Media Processing Laboratory, Center for Automation Research, University of Maryland, College Park, MD 20742
Venue:
Journal of VLSI Signal Processing Systems - special issue on multimedia signal processing
Year:
1998

Citing 9
Cited 0

Identifying the existence of bar codes in compressed images

CVGIP: Graphical Models and Image Processing
Computer vision: compress to comprehend

Pattern Recognition Letters
Document image compression and analysis

Document image compression and analysis
Spatial Sampling of Printed Patterns

IEEE Transactions on Pattern Analysis and Machine Intelligence
Managing Gigabytes: Compressing and Indexing Documents and Images

Managing Gigabytes: Compressing and Indexing Documents and Images
Lossless and lossy compression of text images by soft pattern matching

DCC '96 Proceedings of the Conference on Data Compression
An OCR based on character shape codes and lexical information

ICDAR '95 Proceedings of the Third International Conference on Document Analysis and Recognition (Volume 2) - Volume 2
An overview of the basic principles of the Q-Coder adaptive binary arithmetic coder

IBM Journal of Research and Development - Q-Coder adaptive binary arithmetic coder
Probability estimation for the Q-Coder

IBM Journal of Research and Development - Q-Coder adaptive binary arithmetic coder

Quantified Score

Hi-index	0.00

Visualization

Abstract

Document images belong to a unique class of images where theinformation is embedded in the language represented by a series ofsymbols on the page rather than in the visual objectsthemselves. Since these symbols tend to appear repeatedly, adomain-specific image coding strategy can be designed to facilitateenhanced compression and retrieval. In this paper we describe a codingmethodology that not only exploits component-level redundancy toreduce code length but also supports efficient data access. Theapproach identifies and organizes symbol patterns which appearrepeatedly. Similar components are represented by a single prototypestored in a library and the location of each component instance iscoded along with the residual between it and its prototype. Arepresentation is built which provides a natural information indexallowing access to individual components. Compression results arecompetitive and compressed-domain access is superior to competingmethods. Applications to network-related problems have beenconsidered, and show promising results.