Farsi and Arabic document images lossy compression based on the mixed raster content model

Authors:
Hadi Grailu;Mojtaba Lotfizad;Hadi Sadoghi-Yazdi
Affiliations:
Tarbiat Modares University, Department of Electrical Engineering, Tehran, Iran;Tarbiat Modares University, Department of Electrical Engineering, Tehran, Iran;Ferdowsi University of Mashhad, Department of Computer Engineering, Mashhad, Iran
Venue:
International Journal on Document Analysis and Recognition
Year:
2009

Citing 0
Cited 1

Enhanced layer based compound image compression

Proceedings of the 1st Amrita ACM-W Celebration on Women in Computing in India

Quantified Score

Hi-index	0.00

Visualization

Abstract

Recently, the mixed raster content model was proposed for compound document image compression. Most state-of-the-art document image compression methods, such as DjVu, work on the basis of this model but they have some disadvantages, especially for Farsi and Arabic document images. First, the Farsi/Arabic script has some characteristics which can be used to further improve the compression performance. Second, existing segmentation methods have focused on well-separating the textual objects from the background and/or optimizing the rate-distortion trade-off; nevertheless, they have not considered the text readability and OCR facility. Third, these methods usually suffer from the undesired jaggy artifact and misclassifying the important textual details. In this paper, MRC-based document image compression method is proposed which compromises rate-distortion trade-off better than the existing state-of-the-art document compression methods. The proposed method has higher performance in the aspects of segmentation, bi-level mask layer compression, OCR facility, and the overall compression. It uses a 1D pattern matching technique for compression of mask layer. It also uses a segmentation method which is sensitive enough to the small textual objects. Experimental results show that the proposed method has considerably higher compression performance than that of the state-of-the-art compression method DjVu, as high as 1.75–2.3.