Automatic text processing: the transformation, analysis, and retrieval of information by computer
Automatic text processing: the transformation, analysis, and retrieval of information by computer
Introduction to artificial neural systems
Introduction to artificial neural systems
An example-based mapping method for text categorization and retrieval
ACM Transactions on Information Systems (TOIS)
Fuzzy multiple attribute decision making: a review and new preference elicitation techniques
Fuzzy Sets and Systems - Special issue on fuzzy multiple criteria decision making
Training algorithms for linear text classifiers
SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval
Context-sensitive learning methods for text categorization
SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval
Machine Learning - Special issue on learning with probabilistic representations
An Evaluation of Statistical Approaches to Text Categorization
Information Retrieval
ACIRD: Intelligent Internet Document Organization and Retrieval
IEEE Transactions on Knowledge and Data Engineering
A Comparative Study on Feature Selection in Text Categorization
ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
The VLDB Journal — The International Journal on Very Large Data Bases
Text categorization using weight adjusted k-nearest neighbor classification (information retrieval)
Text categorization using weight adjusted k-nearest neighbor classification (information retrieval)
Incremental Personalized Web Page Mining Utilizing Self-Organizing HCMAC Neural Network
WI '03 Proceedings of the 2003 IEEE/WIC International Conference on Web Intelligence
A Hierarchical Neural Network Document Classifier with Linguistic Feature Selection
Applied Intelligence
IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics
On neurobiological, neuro-fuzzy, machine learning, and statistical pattern recognition techniques
IEEE Transactions on Neural Networks
A self-organizing HCMAC neural-network classifier
IEEE Transactions on Neural Networks
Expert Systems with Applications: An International Journal
A Web page classification system based on a genetic algorithm using tagged-terms as features
Expert Systems with Applications: An International Journal
Expert Systems with Applications: An International Journal
A feature-free search query classification approach using semantic distance
Expert Systems with Applications: An International Journal
Expert Systems: The Journal of Knowledge Engineering
Analyzing uncertainties of probabilistic rough set regions with game-theoretic rough sets
International Journal of Approximate Reasoning
Hi-index | 12.06 |
To help the growing qualitative and quantitative demands for information from the WWW, efficient automatic Web page classifiers are urgently needed. However, a classifier applied to the WWW faces a huge-scale dimensionality problem since it must handle millions of Web pages, tens of thousands of features, and hundreds of categories. When it comes to practical implementation, reducing the dimensionality is a critically important challenge. In this paper, we propose a fuzzy ranking analysis paradigm together with a novel relevance measure, discriminating power measure (DPM), to effectively reduce the input dimensionality from tens of thousands to a few hundred with zero rejection rate and small decrease in accuracy. The two-level promotion method based on fuzzy ranking analysis is proposed to improve the behavior of each relevance measure and combine those measures to produce a better evaluation of features. Additionally, the DPM measure has low computation cost and emphasizes on both positive and negative discriminating features. Also, it emphasizes classification in parallel order, rather than classification in serial order. In our experimental results, the fuzzy ranking analysis is useful for validating the uncertain behavior of each relevance measure. Moreover, the DPM reduces input dimensionality from 10,427 to 200 with zero rejection rate and with less than 5% decline (from 84.5% to 80.4%) in the test accuracy. Furthermore, to consider the impacts on classification accuracy for the proposed DPM, the experimental results of China Time and Reuter-21578 datasets have demonstrated that the DPM provides major benefit to promote document classification accuracy rate. The results also show that the DPM indeed can reduce both redundancy and noise features to set up a better classifier.