Adaptation in natural and artificial systems
Adaptation in natural and artificial systems
The Unicode standard, version 2.0
The Unicode standard, version 2.0
Data compression via textual substitution
Journal of the ACM (JACM)
Experiments with English-Persian text retrieval
Proceedings of the 2nd ACM workshop on Improving non english web searching
Hamshahri: A standard Persian text collection
Knowledge-Based Systems
Fusion of retrieval models at CLEF 2008 ad hoc Persian track
CLEF'08 Proceedings of the 9th Cross-language evaluation forum conference on Evaluating systems for multilingual and multimodal information access
Evolution of human-competitive lossless compression algorithms with GP-zip2
Genetic Programming and Evolvable Machines
Hi-index | 0.00 |
The increasing importance of Unicode for text encoding implies a possible doubling of data storage space and data transmission time, with a corresponding need for data compression. The approach presented in this paper aims to reduce the storage and the transmission time for Persian text files in web-based applications and Internet. The basic idea here is to compute the most repetitive n-grams in the Persian text and replace them by a single character in the user-defined sections of the Unicode. The compression will be done on the server side once and the decompression process is eliminated completely. The rendering process in the browser will do the decompression. There is no need for any additional program or add-ins for decompression to be installed on the browser or client side. The user needs only to download the proper Unicode font once. A genetic algorithm is utilized to select the most appropriate n-grams. In the best case, we have achieved 52.26 % reduction of the file size. The method is general, and applies equally well to English and other languages.