Malay document analysis and recognition

  • Authors:
  • Norzaidah Md Noh;Mohd Rusydi Abdul Talib;Azlin Ahmad;Shamimi A. Halim;Azlinah Mohamed

  • Affiliations:
  • Faculty of Information Technology and Quantitative Science, Universiti Teknologi MARA, Shah Alam, Selangor, Malaysia;Faculty of Information Technology and Quantitative Science, Universiti Teknologi MARA, Shah Alam, Selangor, Malaysia;Faculty of Information Technology and Quantitative Science, Universiti Teknologi MARA, Shah Alam, Selangor, Malaysia;Faculty of Information Technology and Quantitative Science, Universiti Teknologi MARA, Shah Alam, Selangor, Malaysia;Faculty of Information Technology and Quantitative Science, Universiti Teknologi MARA, Shah Alam, Selangor, Malaysia

  • Venue:
  • WSEAS Transactions on Information Science and Applications
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

Malay Document Analysis and Recongition aims to extract digital malay documents automaticaly. These extracted documents are presented in the form of namely articles, newspapers and magazines. Over the years, Malay digital documents has increased and published on the world-wide-web (www) and consequently used by many organizations local and abroad. In this paper, we introduce the implementation of a tool for Malay language document identification in mono- and multi-lingual documents. The tool development includes a feature extraction and a neural network technique. The feature extraction consists of documents filtering, word matching and binary representation of varied length sentences from many types of documents including generic text files, MS Word files, Adobe PDF and HTML web pages. The neural network employs back propagation neural network (BPNN) algorithm with adjustable number of neurons and weights between input, hidden and output layer. A database was constructed consisting of 300 sentences of mono and multi-lingual documents. Experiments show average recognition rate of 90% accuracy in recognizing of Malay language documents, which has more than 80%, matched Malay words. Our tool is able to recognise Malay language documents with reasonable accuracy.