Page segmentation and classification
CVGIP: Graphical Models and Image Processing
Skew Angle Detection of Digitized Indian Script Documents
IEEE Transactions on Pattern Analysis and Machine Intelligence
Document image compression and analysis
Document image compression and analysis
Document Representation and Its Application to Page Decomposition
IEEE Transactions on Pattern Analysis and Machine Intelligence
Managing Gigabytes: Compressing and Indexing Documents and Images
Managing Gigabytes: Compressing and Indexing Documents and Images
An OCR System to Read Two Indian Language Scripts: Bangla and Devnagari (Hindi)
ICDAR '97 Proceedings of the 4th International Conference on Document Analysis and Recognition
Structural Compression for Documents Analysis
ICPR '96 Proceedings of the International Conference on Pattern Recognition (ICPR '96) Volume III-Volume 7276 - Volume 7276
Probability estimation for the Q-Coder
IBM Journal of Research and Development - Q-Coder adaptive binary arithmetic coder
An Approach for Stemming in Symbolically Compressed Indian Language Imaged Documents
ICDAR '05 Proceedings of the Eighth International Conference on Document Analysis and Recognition
Summarization of compressed text images: an experience on Indic script documents
Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Hi-index | 0.00 |
In this paper, a new compression scheme is presented for Indian Language (IL) textual document images. Since OCR technology for IL scripts is not matured enough, transcription of these documents into digital domain needs new techniques that achieve high degree of compression as well as suitable methods to perform various operations like document indexing, retrieval, etc. The proposed method is essentially based on symbolic compression technique, which has been realized with an efficient segmentation-based clustering approach. A soft pattern-matching technique has been implemented using two different feature sets that co-operate each other to build an efficient prototype library. Experiments have been done for documents printed in Devnagari (Hindi) and Bangla scripts, two mostly used script in Indian sub-continent. Test results show that the proposed technique outperforms several standard methods like CCITT Group-4, JBIG, etc. which are frequently used for compression of document images.