Learning routing queries in a query zone
Proceedings of the 20th annual international ACM SIGIR conference on Research and development in information retrieval
Feature selection, perceptron learning, and a usability case study for text categorization
Proceedings of the 20th annual international ACM SIGIR conference on Research and development in information retrieval
Wrappers for feature subset selection
Artificial Intelligence - Special issue on relevance
Inductive learning algorithms and representations for text categorization
Proceedings of the seventh international conference on Information and knowledge management
An Evaluation of Statistical Approaches to Text Categorization
Information Retrieval
Machine learning in automated text categorization
ACM Computing Surveys (CSUR)
Information Retrieval
Data Mining and Knowledge Discovery
Text Categorization Based on Regularized Linear Classification Methods
Information Retrieval
A Comparative Study on Feature Selection in Text Categorization
ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
Feature Selection for Unbalanced Class Distribution and Naive Bayes
ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
Robustness of regularized linear classification methods in text categorization
Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
An extensive empirical study of feature selection metrics for text classification
The Journal of Machine Learning Research
The class imbalance problem: A systematic study
Intelligent Data Analysis
Editorial: special issue on learning from imbalanced data sets
ACM SIGKDD Explorations Newsletter - Special issue on learning from imbalanced datasets
Instance Filtering for entity recognition
ACM SIGKDD Explorations Newsletter - Natural language processing and text mining
NEWPAR: an automatic feature selection and weighting schema for category ranking
Proceedings of the 2006 ACM symposium on Document engineering
Classifying web documents in a hierarchy of categories: a comprehensive study
Journal of Intelligent Information Systems
Text classification: A least square support vector machine approach
Applied Soft Computing
An adaptive crawler for locating hidden-Web entry points
Proceedings of the 16th international conference on World Wide Web
Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Customer targeting models using actively-selected web content
Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
CWC: A Clustering-Based Feature Weighting Approach for Text Classification
MDAI '07 Proceedings of the 4th international conference on Modeling Decisions for Artificial Intelligence
FLSOM with Different Rates for Classification in Imbalanced Datasets
ICANN '08 Proceedings of the 18th international conference on Artificial Neural Networks, Part I
Imbalanced text classification: A term weighting approach
Expert Systems with Applications: An International Journal
A General Framework of Feature Selection for Text Categorization
MLDM '09 Proceedings of the 6th International Conference on Machine Learning and Data Mining in Pattern Recognition
On strategies for imbalanced text classification using SVM: A comparative study
Decision Support Systems
Distinctive characteristics of a metric using deviations from Poisson for feature selection
Expert Systems with Applications: An International Journal
Multi-label Text Classification Approach for Sentence Level News Emotion Analysis
PReMI '09 Proceedings of the 3rd International Conference on Pattern Recognition and Machine Intelligence
Handling class imbalance problem in cultural modeling
ISI'09 Proceedings of the 2009 IEEE international conference on Intelligence and security informatics
A feature selection algorithm based on poisson estimates
FSKD'09 Proceedings of the 6th international conference on Fuzzy systems and knowledge discovery - Volume 1
Analytical evaluation of term weighting schemes for text categorization
Pattern Recognition Letters
Hierarchical auto-tagging: organizing Q&A knowledge for everyone
CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
Expert Systems with Applications: An International Journal
On the suitability of combining feature selection and resampling to manage data complexity
CAEPIA'09 Proceedings of the Current topics in artificial intelligence, and 13th conference on Spanish association for artificial intelligence
Three new feature weighting methods for text categorization
WISM'10 Proceedings of the 2010 international conference on Web information systems and mining
Comparison of metrics for feature selection in imbalanced text classification
Expert Systems with Applications: An International Journal
Exploiting probabilistic topic models to improve text categorization under class imbalance
Information Processing and Management: an International Journal
Two-phase prediction of protein functions from biological literature based on Gini-Index
Proceedings of the 5th International Conference on Ubiquitous Information Management and Communication
Incorporating game theory in feature selection for text categorization
RSFDGrC'11 Proceedings of the 13th international conference on Rough sets, fuzzy sets, data mining and granular computing
Evaluating a semisupervised approach to phishing url identification in a realistic scenario
Proceedings of the 8th Annual Collaboration, Electronic messaging, Anti-Abuse and Spam Conference
A pairwise ranking based approach to learning with positive and unlabeled examples
Proceedings of the 20th ACM international conference on Information and knowledge management
Automatic annotation of protein functional class from sparse and imbalanced data sets
VDMB'06 Proceedings of the First international conference on Data Mining and Bioinformatics
An examination of feature selection frameworks in text categorization
AIRS'05 Proceedings of the Second Asia conference on Asia Information Retrieval Technology
A novel field learning algorithm for dual imbalance text classification
FSKD'05 Proceedings of the Second international conference on Fuzzy Systems and Knowledge Discovery - Volume Part II
Borderline-SMOTE: a new over-sampling method in imbalanced data sets learning
ICIC'05 Proceedings of the 2005 international conference on Advances in Intelligent Computing - Volume Part I
Sentence-Level attachment prediction
IRFC'10 Proceedings of the First international Information Retrieval Facility conference on Adbances in Multidisciplinary Retrieval
A normal distribution-based over-sampling approach to imbalanced data classification
ADMA'11 Proceedings of the 7th international conference on Advanced Data Mining and Applications - Volume Part I
Feature selection for optimizing traffic classification
Computer Communications
Screening nonrandomized studies for medical systematic reviews: A comparative study of classifiers
Artificial Intelligence in Medicine
Nonlinear transformation of term frequencies for term weighting in text categorization
Engineering Applications of Artificial Intelligence
Improving relational similarity measurement using symmetries in proportional word analogies
Information Processing and Management: an International Journal
Sample cutting method for imbalanced text sentiment classification based on BRC
Knowledge-Based Systems
Enhanced boosting-based algorithm for intrusion detection in virtual machine environments
Proceedings of the First International Workshop on Secure and Resilient Architectures and Systems
Feature selection for high-dimensional imbalanced data
Neurocomputing
Comparison of text feature selection policies and using an adaptive framework
Expert Systems with Applications: An International Journal
Generating contextualized sentiment lexica based on latent topics and user ratings
Proceedings of the 24th ACM Conference on Hypertext and Social Media
Proceedings of the 2013 International Conference on Software Engineering
Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining
Robust classification of imbalanced data using one-class and two-class SVM-based multiclassifiers
Intelligent Data Analysis - Business Analytics and Intelligent Optimization
Hi-index | 0.01 |
A number of feature selection metrics have been explored in text categorization, among which information gain (IG), chi-square (CHI), correlation coefficient (CC) and odds ratios (OR) are considered most effective. CC and OR are one-sided metrics while IG and CHI are two-sided. Feature selection using one-sided metrics selects the features most indicative of membership only, while feature selection using two-sided metrics implicitly combines the features most indicative of membership (e.g. positive features) and non-membership (e.g. negative features) by ignoring the signs of features. The former never consider the negative features, which are quite valuable, while the latter cannot ensure the optimal combination of the two kinds of features especially on imbalanced data. In this work, we investigate the usefulness of explicit control of that combination within a proposed feature selection framework. Using multinomial naïve Bayes and regularized logistic regression as classifiers, our experiments show both great potential and actual merits of explicitly combining positive and negative features in a nearly optimal fashion according to the imbalanced data.