C4.5: programs for machine learning
C4.5: programs for machine learning
The nature of statistical learning theory
The nature of statistical learning theory
Inductive learning algorithms and representations for text categorization
Proceedings of the seventh international conference on Information and knowledge management
Making large-scale support vector machine learning practical
Advances in kernel methods
A re-examination of text categorization methods
Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Machine learning in automated text categorization
ACM Computing Surveys (CSUR)
The structure of broad topics on the web
Proceedings of the 11th international conference on World Wide Web
Personalized pocket directories for mobile devices
Proceedings of the 11th international conference on World Wide Web
High-performing feature selection for text classification
Proceedings of the eleventh international conference on Information and knowledge management
A Study of Approaches to Hypertext Categorization
Journal of Intelligent Information Systems
Text Categorization with Suport Vector Machines: Learning with Many Relevant Features
ECML '98 Proceedings of the 10th European Conference on Machine Learning
Feature Subset Selection in Text-Learning
ECML '98 Proceedings of the 10th European Conference on Machine Learning
A Comparative Study on Feature Selection in Text Categorization
ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
Feature Engineering for Text Classification
ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
An extensive empirical study of feature selection metrics for text classification
The Journal of Machine Learning Research
RCV1: A New Benchmark Collection for Text Categorization Research
The Journal of Machine Learning Research
Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Selforganizing classification on the Reuters news corpus
COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 1
Thumbs up?: sentiment classification using machine learning techniques
EMNLP '02 Proceedings of the ACL-02 conference on Empirical methods in natural language processing - Volume 10
Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Interruptible anytime algorithms for iterative improvement of decision trees
UBDM '05 Proceedings of the 1st international workshop on Utility-based data mining
Extracting key-substring-group features for text classification
Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Mining key information of web pages: A method and its application
Expert Systems with Applications: An International Journal
Process-Specific Information for Learning Electronic Negotiation Outcomes
Fundamenta Informaticae
Searching with style: authorship attribution in classic literature
ACSC '07 Proceedings of the thirtieth Australasian conference on Computer science - Volume 62
On the strength of hyperclique patterns for text categorization
Information Sciences: an International Journal
Feature selection methods for text classification
Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Unsupervised feature selection for principal components analysis
Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Performance Measures in Classification of Human Communications
CAI '07 Proceedings of the 20th conference of the Canadian Society for Computational Studies of Intelligence on Advances in Artificial Intelligence
Discovering Knowledge in a Large Organization through Support Vector Machines
ICCS '08 Proceedings of the 8th international conference on Computational Science, Part III
Latent semantic analysis for text categorization using neural network
Knowledge-Based Systems
BNS feature scaling: an improved representation over tf-idf for svm text classification
Proceedings of the 17th ACM conference on Information and knowledge management
Web page classification: Features and algorithms
ACM Computing Surveys (CSUR)
Feature shaping for linear SVM classifiers
Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
Wikipedia-based semantic interpretation for natural language processing
Journal of Artificial Intelligence Research
Avoidance of model re-induction in SVM-based feature selection for text categorization
IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
Supervised latent semantic indexing using adaptive sprinkling
IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
Feature generation for text categorization using world knowledge
IJCAI'05 Proceedings of the 19th international joint conference on Artificial intelligence
Using some web content mining techniques for Arabic text classification
DNCOCO'09 Proceedings of the 8th WSEAS international conference on Data networks, communications, computers
A framework of feature selection methods for text categorization
ACL '09 Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 2 - Volume 2
Automated text categorization based on readability fingerprints
ICANN'07 Proceedings of the 17th international conference on Artificial neural networks
Support vector-based feature selection using Fisher's linear discriminant and Support Vector Machine
Expert Systems with Applications: An International Journal
Discriminative codeword selection for image representation
Proceedings of the international conference on Multimedia
AICI'10 Proceedings of the 2010 international conference on Artificial intelligence and computational intelligence: Part I
A comparative study for Arabic text classification algorithms based on stop words elimination
Proceedings of the 2011 International Conference on Intelligent Semantic Web-Services and Applications
Multiple instance learning for classification of human behavior observations
ACII'11 Proceedings of the 4th international conference on Affective computing and intelligent interaction - Volume Part I
Identifying historical period and ethnic origin of documents using stylistic feature sets
DS'06 Proceedings of the 9th international conference on Discovery Science
Document representations for classification of short web-page descriptions
DaWaK'06 Proceedings of the 8th international conference on Data Warehousing and Knowledge Discovery
Rotational prior knowledge for SVMs
ECML'05 Proceedings of the 16th European conference on Machine Learning
Text classification using small number of features
MLDM'05 Proceedings of the 4th international conference on Machine Learning and Data Mining in Pattern Recognition
Towards automatic and optimal filtering levels for feature selection in text categorization
IDA'05 Proceedings of the 6th international conference on Advances in Intelligent Data Analysis
Interactions between document representation and feature selection in text categorization
DEXA'06 Proceedings of the 17th international conference on Database and Expert Systems Applications
Best subset feature selection for massive mixed-type problems
IDEAL'06 Proceedings of the 7th international conference on Intelligent Data Engineering and Automated Learning
Learning a concept-based document similarity measure
Journal of the American Society for Information Science and Technology
Common Sense Reasoning for Detection, Prevention, and Mitigation of Cyberbullying
ACM Transactions on Interactive Intelligent Systems (TiiS) - Special Issue on Common Sense for Interactive Systems
An efficient minimum vocabulary construction algorithm for language modeling
IEA/AIE'12 Proceedings of the 25th international conference on Industrial Engineering and Other Applications of Applied Intelligent Systems: advanced research in applied artificial intelligence
Process-Specific Information for Learning Electronic Negotiation Outcomes
Fundamenta Informaticae
Document-level sentiment classification: An empirical comparison between SVM and ANN
Expert Systems with Applications: An International Journal
International Journal of Information Technology and Web Engineering
Words that Fascinate the Listener: Predicting Affective Ratings of On-Line Lectures
International Journal of Distance Education Technologies
SVOIS: Support Vector Oriented Instance Selection for text classification
Information Systems
Hi-index | 0.00 |
Text categorization algorithms usually represent documents as bags of words and consequently have to deal with huge numbers of features. Most previous studies found that the majority of these features are relevant for classification, and that the performance of text categorization with support vector machines peaks when no feature selection is performed. We describe a class of text categorization problems that are characterized with many redundant features. Even though most of these features are relevant, the underlying concepts can be concisely captured using only a few features, while keeping all of them has substantially detrimental effect on categorization accuracy. We develop a novel measure that captures feature redundancy, and use it to analyze a large collection of datasets. We show that for problems plagued with numerous redundant features the performance of C4.5 is significantly superior to that of SVM, while aggressive feature selection allows SVM to beat C4.5 by a narrow margin.