Computing Optimal Attribute Weight Settings for Nearest NeighborAlgorithms
Artificial Intelligence Review - Special issue on lazy learning
Neural Networks: A Comprehensive Foundation
Neural Networks: A Comprehensive Foundation
Linguini: language identification for multilingual documents
Journal of Management Information Systems - Special section: Exploring the outlands of the MIS discipline
Web Based Machine Learning for Language Identification and Translation
ICMLA '07 Proceedings of the Sixth International Conference on Machine Learning and Applications
Hi-index | 0.00 |
Document Identification aims to extract information presented in digital documents namely articles, newpapers, magazines and e-books. The popularity of world-wide-web (www) in disseminating information within a click of a mouse has also increased the amount of Malay documents published on the www. This has given rise to many language identification systems. However, not much research on Malay Document identification language has been established. Therefore the purpose of this paper is to address the implementation of a tool for Malay language document identification in mono- and multi-lingual documents. The system development includes a feature extraction and a neural network technique. The feature extraction consists of documents filtering, word matching and binary representation of varied length sentences from many types of documents including generic text files, MS Word files, Adobe PDF and HTML web pages. The neural network employs back propagation neural network (BPNN) algorithm with adjustable number of neurons and weights between input, hidden and output layer. Experiments were conducted on 300 sentences of mono and multilingual documents. The results shown 90% accuracy in identification of Malay language documents, which has more than 80%, matched Malay words. This tool is able to recognise Malay document from other language documents with reasonable accuracy.