Detecting relationships among categories using text classification
Journal of the American Society for Information Science and Technology
Context aware query classification using dynamic query window and relationship net
Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
A new feature selection algorithm based on binomial hypothesis testing for spam filtering
Knowledge-Based Systems
Information Processing and Management: an International Journal
A novel probabilistic feature selection method for text classification
Knowledge-Based Systems
Using micro-documents for feature selection: The case of ordinal text classification
Expert Systems with Applications: An International Journal
Hi-index | 0.00 |
With the increasing number of digital documents, the ability to automatically classify those documents both efficiently and accurately is becoming more critical and difficult. One of the major problems in text classification is the high dimensionality of feature space. We present the ambiguity measure (AM) feature-selection algorithm, which selects the most unambiguous features from the feature set. Unambiguous features are those features whose presence in a document indicate a strong degree of confidence that a document belongs to only one specific category. We apply AM feature selection on a naïve Bayes text classifier. We favorably show the effectiveness of our approach in outperforming eight existing feature-selection methods, using five benchmark datasets with a statistical significance of at least 95% confidence. The support vector machine (SVM) text classifier is shown to perform consistently better than the naïve Bayes text classifier. The drawback, however, is the time complexity in training a model. We further explore the effect of using the AM feature-selection method on an SVM text classifier. Our results indicate that the training time for the SVM algorithm can be reduced by more than 50%, while still improving the accuracy of the text classifier. We favorably show the effectiveness of our approach by demonstrating that it statistically significantly (99% confidence) outperforms eight existing feature-selection methods using four standard benchmark datasets. © 2009 Wiley Periodicals, Inc.