Evaluating and optimizing autonomous text classification systems
SIGIR '95 Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval
Neural networks for language identification: a comparative study
Information Processing and Management: an International Journal
Bayesian online classifiers for text classification and filtering
SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
A language and character set determination method based on N-gram statistics
ACM Transactions on Asian Language Information Processing (TALIP)
Character N-Gram Tokenization for European Language Text Retrieval
Information Retrieval
Web page feature selection and classification using neural networks
Information Sciences—Informatics and Computer Science: An International Journal - Special issue: Informatics and computer science intelligent systems applications
An English to Korean transliteration model of extended Markov window
COLING '00 Proceedings of the 18th conference on Computational linguistics - Volume 1
Language identification in web pages
Proceedings of the 2005 ACM symposium on Applied computing
Language and task independent text categorization with simple language models
NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1
Evaluation of a language identification system for mono- and multilingual text documents
Proceedings of the 2006 ACM symposium on Applied computing
Feature subset selection bias for classification learning
ICML '06 Proceedings of the 23rd international conference on Machine learning
Multilingual ICT education: language observatory as a monitoring instrument
SEARCC '05 Proceedings of the 2005 South East Asia Regional Computer Science Confederation (SEARCC) Conference - Volume 46
Computational Methods of Feature Selection (Chapman & Hall/Crc Data Mining and Knowledge Discovery Series)
Neural Networks: A Comprehensive Foundation (3rd Edition)
Neural Networks: A Comprehensive Foundation (3rd Edition)
Imbalanced text classification: A term weighting approach
Expert Systems with Applications: An International Journal
Construction of supervised and unsupervised learning systems for multilingual text categorization
Expert Systems with Applications: An International Journal
Text feature selection using ant colony optimization
Expert Systems with Applications: An International Journal
Personalized text snippet extraction using statistical language models
Pattern Recognition
A study of cross-validation and bootstrap for accuracy estimation and model selection
IJCAI'95 Proceedings of the 14th international joint conference on Artificial intelligence - Volume 2
Automatic text categorization based on content analysis with cognitive situation models
Information Sciences: an International Journal
Robust language identification based on fused phonotactic information with MLKSFM pre-classifier
ICME'09 Proceedings of the 2009 IEEE international conference on Multimedia and Expo
An automatic language identification method based on subspace analysis
ICME'09 Proceedings of the 2009 IEEE international conference on Multimedia and Expo
An effective refinement strategy for KNN text classifier
Expert Systems with Applications: An International Journal
Text classification using graph mining-based feature extraction
Knowledge-Based Systems
Analytical evaluation of term weighting schemes for text categorization
Pattern Recognition Letters
Letter based text scoring method for language identification
ADVIS'04 Proceedings of the Third international conference on Advances in Information Systems
Disentangling from babylonian confusion – unsupervised language identification
CICLing'05 Proceedings of the 6th international conference on Computational Linguistics and Intelligent Text Processing
A Vector Space Modeling Approach to Spoken Language Identification
IEEE Transactions on Audio, Speech, and Language Processing
Improved N-grams approach for web page language identification
Transactions on computational collective intelligence V
Hi-index | 0.01 |
In this paper, we propose a hybrid approach of Arabic scripts web page language identification based on decision tree and ARTMAP approaches. We use the decision tree approach to find the general identities of a web document, be it an Arabic script-based or a non-Arabic-based. Then, we use the selected representations of identified pages from the decision tree approach as an input to the ARTMAP neural network for further verification of the diversity of languages detected by the algorithm. From our initial experiments, we found that, although the decision tree approach may achieve a higher accuracy than ARTMAP, the former may not be as reliable as the ARTMAP approach if the language used is extended to other types of Arabic script web documents in different languages (e.g., Urdu, Arabic, Persian, etc.). Therefore, we propose this hybrid decision tree-ARTMAP approach in order to improve the performance of the Arabic script language identification on web documents in a variety of languages. The result shows that the proposed approach has outperformed both decision tree and the default ARTMAP approaches.