Markov and fuzzy models for written language verification

  • Authors:
  • Dat T. Tran;Tuan D. Pham

  • Affiliations:
  • School of Information Sciences and Engineering, University of Canberra, Canberra, ACT, Australia;School of Information Technology, James Cook University, Townsville, QLD, Australia

  • Venue:
  • FS'05 Proceedings of the 6th WSEAS international conference on Fuzzy systems
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper presents a computational algorithm for machine classification of written languages using the Markov chain-based method for building language models and the fuzzy set theory-based normalization method to verify language. For a language document, each word is represented as a Markov chain of alphabetical letters. The initial probability and transition probabilities are calculated and the set of such probabilities obtained from the training data is referred to as the model of that language. Given an unknown text document and a claimed identity of a language, a similarity score based on fuzzy set theory is calculated and compared with a preset threshold. If the match is good enough, the identity claim is accepted. The proposed fuzzy normalization method is more effective for machine learning than the non-fuzzy normalization method, which has been widely used for speaker verification. Experimental results of verifying a set of seven closely roman-typed languages show the promising application of the proposed method.