Computing Optimal Attribute Weight Settings for Nearest NeighborAlgorithms
Artificial Intelligence Review - Special issue on lazy learning
Journal of the American Society for Information Science
The anatomy of a large-scale hypertextual Web search engine
WWW7 Proceedings of the seventh international conference on World Wide Web 7
Focused crawling: a new approach to topic-specific Web resource discovery
WWW '99 Proceedings of the eighth international conference on World Wide Web
Neural Networks: A Comprehensive Foundation
Neural Networks: A Comprehensive Foundation
Design and Implementation of a High-Performance Distributed Web Crawler
ICDE '02 Proceedings of the 18th International Conference on Data Engineering
Linguini: language identification for multilingual documents
Journal of Management Information Systems - Special section: Exploring the outlands of the MIS discipline
Web Based Machine Learning for Language Identification and Translation
ICMLA '07 Proceedings of the Sixth International Conference on Machine Learning and Applications
Intelligent water dispersal controller using Mamdani approach
FS'07 Proceedings of the 8th Conference on 8th WSEAS International Conference on Fuzzy Systems - Volume 8
Hi-index | 0.00 |
Malay Document Analysis and Recongition aims to extract digital malay documents automaticaly. These extracted documents are presented in the form of namely articles, newspapers and magazines. Over the years, Malay digital documents has increased and published on the world-wide-web (www) and consequently used by many organizations local and abroad. In this paper, we introduce the implementation of a tool for Malay language document identification in mono- and multi-lingual documents. The tool development includes a feature extraction and a neural network technique. The feature extraction consists of documents filtering, word matching and binary representation of varied length sentences from many types of documents including generic text files, MS Word files, Adobe PDF and HTML web pages. The neural network employs back propagation neural network (BPNN) algorithm with adjustable number of neurons and weights between input, hidden and output layer. A database was constructed consisting of 300 sentences of mono and multi-lingual documents. Experiments show average recognition rate of 90% accuracy in recognizing of Malay language documents, which has more than 80%, matched Malay words. Our tool is able to recognise Malay language documents with reasonable accuracy.