Term-weighting approaches in automatic text retrieval
Information Processing and Management: an International Journal
Elements of information theory
Elements of information theory
Representation and learning in information retrieval
Representation and learning in information retrieval
Evaluating and optimizing autonomous text classification systems
SIGIR '95 Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval
Readings in information retrieval
Readings in information retrieval
Exploring the similarity space
ACM SIGIR Forum
Making large-scale support vector machine learning practical
Advances in kernel methods
Foundations of statistical natural language processing
Foundations of statistical natural language processing
Machine learning in automated text categorization
ACM Computing Surveys (CSUR)
Machine Learning
A Comparative Study on Feature Selection in Text Categorization
ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
Effect of term distributions on centroid-based text categorization
Information Sciences—Informatics and Computer Science: An International Journal - Special issue: Informatics and computer science intelligent systems applications
A New Term Significance Weighting Approach
Journal of Intelligent Information Systems
An analysis of the relative hardness of Reuters-21578 subsets: Research Articles
Journal of the American Society for Information Science and Technology
OCFS: optimal orthogonal centroid feature selection for text categorization
Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
An application of text categorization methods to gene ontology annotation
Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
Exploiting structural information for semi-structured document categorization
Information Processing and Management: an International Journal
Classifying web documents in a hierarchy of categories: a comprehensive study
Journal of Intelligent Information Systems
Raising the baseline for high-precision text classifiers
Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
Gene ontology annotation as text categorization: An empirical study
Information Processing and Management: an International Journal
An Indexing Matrix Based Retrieval Model
ICIC '08 Proceedings of the 4th international conference on Intelligent Computing: Advanced Intelligent Computing Theories and Applications - with Aspects of Theoretical and Methodological Issues
Imbalanced text classification: A term weighting approach
Expert Systems with Applications: An International Journal
Feature shaping for linear SVM classifiers
Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
Proposing a new term weighting scheme for text categorization
AAAI'06 Proceedings of the 21st national conference on Artificial intelligence - Volume 1
Wikipedia-based semantic interpretation for natural language processing
Journal of Artificial Intelligence Research
IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
Hierarchical-Hyperspherical Divisive Fuzzy C-Means (H2D-FCM) Clustering for Information Retrieval
WI-IAT '09 Proceedings of the 2009 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology - Volume 01
Beyond TFIDF weighting for text categorization in the vector space model
IJCAI'05 Proceedings of the 19th international joint conference on Artificial intelligence
Boosting KNN text classification accuracy by using supervised term weighting schemes
Proceedings of the 18th ACM conference on Information and knowledge management
Exploiting structural information for semi-structured document categorization
Information Processing and Management: an International Journal
Commercial Internet filters: Perils and opportunities
Decision Support Systems
Classification of skewed and homogenous document corpora with class-based and corpus-based keywords
KI'06 Proceedings of the 29th annual German conference on Artificial intelligence
A simple probability based term weighting scheme for automated text classification
IEA/AIE'07 Proceedings of the 20th international conference on Industrial, engineering, and other applications of applied intelligent systems
Using active modeling in counterterrorism
Active conceptual modeling of learning
A weighting approach for features based on real rough set
FSKD'09 Proceedings of the 6th international conference on Fuzzy systems and knowledge discovery - Volume 6
Analytical evaluation of term weighting schemes for text categorization
Pattern Recognition Letters
A study of spam filtering using support vector machines
Artificial Intelligence Review
A schema for ontology-based concept definition and identification
International Journal of Computer Applications in Technology
Adaptable term weighting framework for text classification
CICLing'11 Proceedings of the 12th international conference on Computational linguistics and intelligent text processing - Volume Part II
An N-Gram and STF-IDF model for masquerade detection in a UNIX environment
Journal in Computer Virology
Interactive feature selection for document clustering
Proceedings of the 2011 ACM Symposium on Applied Computing
DTTM: a discriminative temporal topic model for facial expression recognition
ISVC'11 Proceedings of the 7th international conference on Advances in visual computing - Volume Part I
An examination of feature selection frameworks in text categorization
AIRS'05 Proceedings of the Second Asia conference on Asia Information Retrieval Technology
A quality driven Hierarchical Data Divisive Soft Clustering for information retrieval
Knowledge-Based Systems
Class normalization in centroid-based text categorization
Information Sciences: an International Journal
WordNet-Based word sense disambiguation for learning user profiles
EWMF'05/KDO'05 Proceedings of the 2005 joint international conference on Semantics, Web and Mining
A global-ranking local feature selection method for text categorization
Expert Systems with Applications: An International Journal
Features' weight learning towards improved query classification
AIS'12 Proceedings of the Third international conference on Autonomous and Intelligent Systems
Soft cardinality + ML: learning adaptive similarity functions for cross-lingual textual entailment
SemEval '12 Proceedings of the First Joint Conference on Lexical and Computational Semantics - Volume 1: Proceedings of the main conference and the shared task, and Volume 2: Proceedings of the Sixth International Workshop on Semantic Evaluation
A high performance centroid-based classification approach for language identification
Pattern Recognition Letters
Methods for dictionary generation
Automatic Documentation and Mathematical Linguistics
Class-indexing-based term weighting for automatic text classification
Information Sciences: an International Journal
Comparison of text feature selection policies and using an adaptive framework
Expert Systems with Applications: An International Journal
A study of supervised term weighting scheme for sentiment analysis
Expert Systems with Applications: An International Journal
Mutual information evaluation: A way to predict the performance of feature weighting on clustering
Intelligent Data Analysis
Hi-index | 0.01 |
The construction of a text classifier usually involves (i) a phase of term selection, in which the most relevant terms for the classification task are identified, (ii) a phase of term weighting, in which document weights for the selected terms are computed, and (iii) a phase of classifier learning, in which a classifier is generated from the weighted representations of the training documents. This process involves an activity of supervised learning, in which information on the membership of training documents in categories is used. Traditionally, supervised learning enters only phases (i) and (iii). In this paper we propose instead that learning from training data should also affect phase (ii), i.e. that information on the membership of training documents to categories be used to determine term weights. We call this idea supervised term weighting (STW). As an example, we propose a number of "supervised variants" of t f idf weighting, obtained by replacing the idf function with the function that has been used in phase (i) for term selection. We present experimental results obtained on the standard Reuters-21578 benchmark with one classifier learning method (support vector machines), three term selection functions (information gain, chi-square, and gain ratio), and both local and global term selection and weighting.