Text Classification from Labeled and Unlabeled Documents using EM
Machine Learning - Special issue on information retrieval
Data mining: concepts and techniques
Data mining: concepts and techniques
An Evaluation of Statistical Approaches to Text Categorization
Information Retrieval
Machine Learning
Modern Information Retrieval
Introduction to Modern Information Retrieval
Introduction to Modern Information Retrieval
Text Categorization with Suport Vector Machines: Learning with Many Relevant Features
ECML '98 Proceedings of the 10th European Conference on Machine Learning
A Comparative Study on Feature Selection in Text Categorization
ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
A Probabilistic Analysis of the Rocchio Algorithm with TFIDF for Text Categorization
ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
Pattern Classification (2nd Edition)
Pattern Classification (2nd Edition)
The Anatomy of the Grid: Enabling Scalable Virtual Organizations
International Journal of High Performance Computing Applications
Natural Language Processing and Text Mining
Natural Language Processing and Text Mining
HAND: Highly Available Dynamic Deployment Infrastructure for Globus Toolkit 4
PDP '07 Proceedings of the 15th Euromicro International Conference on Parallel, Distributed and Network-Based Processing
Text Mining Grid Services for Multiple Environments
High Performance Computing for Computational Science - VECPAR 2008
Hi-index | 0.00 |
The enormous amount of information stored in unstructured texts cannot simply be used for further processing by computers, which typically handle text as simple sequences of character strings. Text mining is the process of extracting interesting information and knowledge from unstructured text. One key difficulty with text classification learning algorithms is that they require many hand-labeled documents to learn accurately. In the text mining pattern discovery phase, the text classification step aims at automatically attribute one or more predefined classes to text documents. In this research, we propose to use an algorithm for learning from labeled and unlabeled documents based on the combination of Expectation-Maximization (EM) and a naïve Bayes classifier on a grid environment, this combination is based on a mixture of multinomials, which is commonly used in text classification. Naïve Bayes is a probabilistic approach to inductive learning. It estimates the a posteriori probability that a document belongs to a class given the observed feature values of the documents, assuming independence of the features. The class with the maximum a posteriori probability is assigned to the document. Expectation-Maximization (EM) is a class of iterative algorithms for maximum likelihood or maximum a posteriori estimation in problems with unlabeled data. The grid environment is a geographically distributed computation infrastructure composed of a set of heterogeneous resources. The semi-supervised learning classifier in the grid is available as a grid service, expanding the functionality of Aîuri Portal, which is a framework for a cooperative academic environment for education and research. Text classification mining methods are time-consuming by using the grid infrastructure can bring significant benefits in learning and the classification process.