Algorithms for clustering data
Algorithms for clustering data
Term-weighting approaches in automatic text retrieval
Information Processing and Management: an International Journal
Learning to Understand Information on the Internet: AnExample-Based Approach
Journal of Intelligent Information Systems - Special issue: next generation information technologies and systems
Enhanced hypertext categorization using hyperlinks
SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
IEPAD: information extraction based on pattern discovery
Proceedings of the 10th international conference on World Wide Web
Text databases & document management
Machine learning in automated text categorization
ACM Computing Surveys (CSUR)
A machine learning based approach for table detection on the web
Proceedings of the 11th international conference on World Wide Web
Data Mining: Introductory and Advanced Topics
Data Mining: Introductory and Advanced Topics
A Study of Approaches to Hypertext Categorization
Journal of Intelligent Information Systems
Text Categorization with Suport Vector Machines: Learning with Many Relevant Features
ECML '98 Proceedings of the 10th European Conference on Machine Learning
Authorship Attribution with Support Vector Machines
Applied Intelligence
An extensive empirical study of feature selection metrics for text classification
The Journal of Machine Learning Research
Classification of Text Documents
ICPR '98 Proceedings of the 14th International Conference on Pattern Recognition-Volume 2 - Volume 2
ICML '04 Proceedings of the twenty-first international conference on Machine learning
QARAB: a question answering system to support the Arabic language
SEMITIC '02 Proceedings of the ACL-02 workshop on Computational approaches to semitic languages
Text classification in Asian languages without word segmentation
AsianIR '03 Proceedings of the sixth international workshop on Information retrieval with Asian languages - Volume 11
A computational morphology system for Arabic
Semitic '98 Proceedings of the Workshop on Computational Approaches to Semitic Languages
Automatic Arabic document categorization based on the Naïve Bayes algorithm
Semitic '04 Proceedings of the Workshop on Computational Approaches to Arabic Script-based Languages
Text mining documents in electronic data interchange environment
NN'10/EC'10/FS'10 Proceedings of the 11th WSEAS international conference on nural networks and 11th WSEAS international conference on evolutionary computing and 11th WSEAS international conference on Fuzzy systems
Using text mining techniques in electronic data interchange environment
WSEAS Transactions on Computers
The Effect of Stemming on Arabic Text Classification: An Empirical Study
International Journal of Information Retrieval Research
Hi-index | 0.00 |
With the massive rise in the volume of information available on the World Wide Web these days, and the emergence requirements for a superior technique to access this information, there has been a strong resurgence of interest in web mining research. Web mining is a critical issue in data mining as well as other information process techniques to the World Wide Web to discover useful patterns. People can take advantage of these patterns to access the World Wide Web more efficiently. Web mining can be divided into three categories such as content mining, usage mining, and structure mining. In this paper we are going to apply web content mining to extract non-English knowledge from the web. We will investigate and evaluate some common methods; using web mining systems which have to deal with issues in language-specific text processing. Arabic language-independent algorithm will be used as a machine learning system. The algorithm will use a set of features as a vector of keywords for the learning process to apply text classification for the system. The algorithm usually used to classify a various number of documents written in a non English text language. The techniques used in the algorithm to categorize and classified the documents are two classifiers: Classifier K-Nearest Neighbor (CK-NN) and Classifier Naïve Bayes (CNB). However, the algorithms usually depend on some phrase segmentation and extraction programs to generate a set of features or keywords to represent the retrieved web documents. A proposed Arabic text classification system will be called Arabic Text Classifier (ATC). The main goal of ATC is to compares the results between both classifiers used (CKNN, CNB) and select the best average accuracy result rates to start a retrieving process. The theorem behind the ATC was introduced in this paper without demonstrating any practical views of the system.