Substitution Deciphering Based on HMMs with Applications to Compressed Document Processing

Authors:
Dar-Shyang Lee
Affiliations:
-
Venue:
IEEE Transactions on Pattern Analysis and Machine Intelligence
Year:
2002

Citing 11
Cited 5

Decoding Substitution Ciphers by Means of Word Matching with Application to OCR

IEEE Transactions on Pattern Analysis and Machine Intelligence
Automatic text processing

Automatic text processing
An implementation of probabilistic relaxation in the cryptanalysis of simple substitution ciphers

Cryptologia
Breaking substitution ciphers using a relaxation algorithm

Communications of the ACM
Managing Gigabytes: Compressing and Indexing Documents and Images

Managing Gigabytes: Compressing and Indexing Documents and Images
Toward Zero-Effort Personal Document Management

Computer
Document Image Decoding Using Markov Source Models

IEEE Transactions on Pattern Analysis and Machine Intelligence
The Detection of Duplicates in Document Image Databases

ICDAR '97 Proceedings of the 4th International Conference on Document Analysis and Recognition
Duplicate Detection for Symbolically Compressed Documents

ICDAR '99 Proceedings of the Fifth International Conference on Document Analysis and Recognition
DjVu: Analyzing and Compressing Scanned Documents for Internet Distribution

ICDAR '99 Proceedings of the Fifth International Conference on Document Analysis and Recognition
The emerging JBIG2 standard

IEEE Transactions on Circuits and Systems for Video Technology

A natural language approach to automated cryptanalysis of two-time pads

Proceedings of the 13th ACM conference on Computer and communications security
Modelling Stem Cells Lineages with Markov Trees

PRIB '09 Proceedings of the 4th IAPR International Conference on Pattern Recognition in Bioinformatics
Crypt analysis of two time pads in case of compressed speech

Computers and Electrical Engineering
Bounding the probability of error for high precision optical character recognition

The Journal of Machine Learning Research
Estimation and selection via absolute penalized convex minimization and its multistage adaptive applications

The Journal of Machine Learning Research

Quantified Score

Hi-index	0.14

Visualization

Abstract

It has been shown that simple substitution ciphers can be solved using statistical methods such as probabilistic relaxation. However, the utility of such solutions has been limited by their inability to cope with noise encountered in practical applications. In this paper, we propose a new solution to substitution deciphering based on hidden Markov models. We show that our algorithm is more accurate than relaxation and much more robust in the presence of noise, making it useful for applications in compressed document processing. Recovering character interpretations from the sequence of cluster identifiers in a symbolically compressed document can be treated as a cipher problem. Although a significant amount of noise is present in the cluster sequence, enough information can be recovered with a robust deciphering algorithm to accomplish certain document analysis tasks. The feasibility of this approach is demonstrated in a multilingual document duplicate detection system.