Malay language document identification using BPNN

Authors:
Norzaidah Md Noh;Mohd Rusydi Abdul Talib;Azlin Ahmad;Shamimi A. Halim;Azlinah Mohamed
Affiliations:
SIG Intelligent Systems, Faculty of Information Technology and Quantitative Science, Universiti Teknologi MARA, Selangor, Malaysia;Faculty of Information Technology and Quantitative Science, Universiti Teknologi MARA, Selangor, Malaysia;Faculty of Information Technology and Quantitative Science, Universiti Teknologi MARA, Selangor, Malaysia;Faculty of Information Technology and Quantitative Science, Universiti Teknologi MARA, Selangor, Malaysia;Faculty of Information Technology and Quantitative Science, Universiti Teknologi MARA, Selangor, Malaysia
Venue:
NN'09 Proceedings of the 10th WSEAS international conference on Neural networks
Year:
2009

Citing 4
Cited 0

Computing Optimal Attribute Weight Settings for Nearest NeighborAlgorithms

Artificial Intelligence Review - Special issue on lazy learning
Neural Networks: A Comprehensive Foundation

Neural Networks: A Comprehensive Foundation
Linguini: language identification for multilingual documents

Journal of Management Information Systems - Special section: Exploring the outlands of the MIS discipline
Web Based Machine Learning for Language Identification and Translation

ICMLA '07 Proceedings of the Sixth International Conference on Machine Learning and Applications

Quantified Score

Hi-index	0.00

Visualization

Abstract

Document Identification aims to extract information presented in digital documents namely articles, newpapers, magazines and e-books. The popularity of world-wide-web (www) in disseminating information within a click of a mouse has also increased the amount of Malay documents published on the www. This has given rise to many language identification systems. However, not much research on Malay Document identification language has been established. Therefore the purpose of this paper is to address the implementation of a tool for Malay language document identification in mono- and multi-lingual documents. The system development includes a feature extraction and a neural network technique. The feature extraction consists of documents filtering, word matching and binary representation of varied length sentences from many types of documents including generic text files, MS Word files, Adobe PDF and HTML web pages. The neural network employs back propagation neural network (BPNN) algorithm with adjustable number of neurons and weights between input, hidden and output layer. Experiments were conducted on 300 sentences of mono and multilingual documents. The results shown 90% accuracy in identification of Malay language documents, which has more than 80%, matched Malay words. This tool is able to recognise Malay document from other language documents with reasonable accuracy.