Malay language document identification using BPNN

  • Authors:
  • Norzaidah Md Noh;Mohd Rusydi Abdul Talib;Azlin Ahmad;Shamimi A. Halim;Azlinah Mohamed

  • Affiliations:
  • SIG Intelligent Systems, Faculty of Information Technology and Quantitative Science, Universiti Teknologi MARA, Selangor, Malaysia;Faculty of Information Technology and Quantitative Science, Universiti Teknologi MARA, Selangor, Malaysia;Faculty of Information Technology and Quantitative Science, Universiti Teknologi MARA, Selangor, Malaysia;Faculty of Information Technology and Quantitative Science, Universiti Teknologi MARA, Selangor, Malaysia;Faculty of Information Technology and Quantitative Science, Universiti Teknologi MARA, Selangor, Malaysia

  • Venue:
  • NN'09 Proceedings of the 10th WSEAS international conference on Neural networks
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

Document Identification aims to extract information presented in digital documents namely articles, newpapers, magazines and e-books. The popularity of world-wide-web (www) in disseminating information within a click of a mouse has also increased the amount of Malay documents published on the www. This has given rise to many language identification systems. However, not much research on Malay Document identification language has been established. Therefore the purpose of this paper is to address the implementation of a tool for Malay language document identification in mono- and multi-lingual documents. The system development includes a feature extraction and a neural network technique. The feature extraction consists of documents filtering, word matching and binary representation of varied length sentences from many types of documents including generic text files, MS Word files, Adobe PDF and HTML web pages. The neural network employs back propagation neural network (BPNN) algorithm with adjustable number of neurons and weights between input, hidden and output layer. Experiments were conducted on 300 sentences of mono and multilingual documents. The results shown 90% accuracy in identification of Malay language documents, which has more than 80%, matched Malay words. This tool is able to recognise Malay document from other language documents with reasonable accuracy.