Supervised term weighting for automated text categorization

Authors:
Franca Debole;Fabrizio Sebastiani
Affiliations:
Istituto di Scienza e Technologie dell'Informazione, Pisa (Italy);Istituto di Scienza e Tecnologie dell'Informazione, Pisa (Italy)
Venue:
Proceedings of the 2003 ACM symposium on Applied computing
Year:
2003

Citing 11
Cited 46

Term-weighting approaches in automatic text retrieval

Information Processing and Management: an International Journal
Elements of information theory

Elements of information theory
Representation and learning in information retrieval

Representation and learning in information retrieval
Evaluating and optimizing autonomous text classification systems

SIGIR '95 Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval
Readings in information retrieval

Readings in information retrieval
Exploring the similarity space

ACM SIGIR Forum
Making large-scale support vector machine learning practical

Advances in kernel methods
Foundations of statistical natural language processing

Foundations of statistical natural language processing
Machine learning in automated text categorization

ACM Computing Surveys (CSUR)
Induction of Decision Trees

Machine Learning
A Comparative Study on Feature Selection in Text Categorization

ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning

Effect of term distributions on centroid-based text categorization

Information Sciences—Informatics and Computer Science: An International Journal - Special issue: Informatics and computer science intelligent systems applications
A New Term Significance Weighting Approach

Journal of Intelligent Information Systems
An analysis of the relative hardness of Reuters-21578 subsets: Research Articles

Journal of the American Society for Information Science and Technology
OCFS: optimal orthogonal centroid feature selection for text categorization

Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
An application of text categorization methods to gene ontology annotation

Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
Exploiting structural information for semi-structured document categorization

Information Processing and Management: an International Journal
Classifying web documents in a hierarchy of categories: a comprehensive study

Journal of Intelligent Information Systems
Raising the baseline for high-precision text classifiers

Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
Gene ontology annotation as text categorization: An empirical study

Information Processing and Management: an International Journal
An Indexing Matrix Based Retrieval Model

ICIC '08 Proceedings of the 4th international conference on Intelligent Computing: Advanced Intelligent Computing Theories and Applications - with Aspects of Theoretical and Methodological Issues
Imbalanced text classification: A term weighting approach

Expert Systems with Applications: An International Journal
Feature shaping for linear SVM classifiers

Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
Proposing a new term weighting scheme for text categorization

AAAI'06 Proceedings of the 21st national conference on Artificial intelligence - Volume 1
Wikipedia-based semantic interpretation for natural language processing

Journal of Artificial Intelligence Research
An adaptive context-based algorithm for term weighting: application to single-word question answering

IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
Hierarchical-Hyperspherical Divisive Fuzzy C-Means (H2D-FCM) Clustering for Information Retrieval

WI-IAT '09 Proceedings of the 2009 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology - Volume 01
Beyond TFIDF weighting for text categorization in the vector space model

IJCAI'05 Proceedings of the 19th international joint conference on Artificial intelligence
Boosting KNN text classification accuracy by using supervised term weighting schemes

Proceedings of the 18th ACM conference on Information and knowledge management
Exploiting structural information for semi-structured document categorization

Information Processing and Management: an International Journal
Commercial Internet filters: Perils and opportunities

Decision Support Systems
Classification of skewed and homogenous document corpora with class-based and corpus-based keywords

KI'06 Proceedings of the 29th annual German conference on Artificial intelligence
A simple probability based term weighting scheme for automated text classification

IEA/AIE'07 Proceedings of the 20th international conference on Industrial, engineering, and other applications of applied intelligent systems
Using active modeling in counterterrorism

Active conceptual modeling of learning
A weighting approach for features based on real rough set

FSKD'09 Proceedings of the 6th international conference on Fuzzy systems and knowledge discovery - Volume 6
Analytical evaluation of term weighting schemes for text categorization

Pattern Recognition Letters
A study of spam filtering using support vector machines

Artificial Intelligence Review
A schema for ontology-based concept definition and identification

International Journal of Computer Applications in Technology
Adaptable term weighting framework for text classification

CICLing'11 Proceedings of the 12th international conference on Computational linguistics and intelligent text processing - Volume Part II
An N-Gram and STF-IDF model for masquerade detection in a UNIX environment

Journal in Computer Virology
Interactive feature selection for document clustering

Proceedings of the 2011 ACM Symposium on Applied Computing
DTTM: a discriminative temporal topic model for facial expression recognition

ISVC'11 Proceedings of the 7th international conference on Advances in visual computing - Volume Part I
An examination of feature selection frameworks in text categorization

AIRS'05 Proceedings of the Second Asia conference on Asia Information Retrieval Technology
A quality driven Hierarchical Data Divisive Soft Clustering for information retrieval

Knowledge-Based Systems
Using the absolute difference of term occurrence probabilities in binary text categorization

Applied Intelligence
Class normalization in centroid-based text categorization

Information Sciences: an International Journal
WordNet-Based word sense disambiguation for learning user profiles

EWMF'05/KDO'05 Proceedings of the 2005 joint international conference on Semantics, Web and Mining
#nowplaying Madonna: a large-scale evaluation on estimating similarities between music artists and between movies from microblogs

Information Retrieval
A global-ranking local feature selection method for text categorization

Expert Systems with Applications: An International Journal
Features' weight learning towards improved query classification

AIS'12 Proceedings of the Third international conference on Autonomous and Intelligent Systems
Soft cardinality + ML: learning adaptive similarity functions for cross-lingual textual entailment

SemEval '12 Proceedings of the First Joint Conference on Lexical and Computational Semantics - Volume 1: Proceedings of the main conference and the shared task, and Volume 2: Proceedings of the Sixth International Workshop on Semantic Evaluation
A high performance centroid-based classification approach for language identification

Pattern Recognition Letters
Methods for dictionary generation

Automatic Documentation and Mathematical Linguistics
Class-indexing-based term weighting for automatic text classification

Information Sciences: an International Journal
Comparison of text feature selection policies and using an adaptive framework

Expert Systems with Applications: An International Journal
A study of supervised term weighting scheme for sentiment analysis

Expert Systems with Applications: An International Journal
Mutual information evaluation: A way to predict the performance of feature weighting on clustering

Intelligent Data Analysis

Quantified Score

Hi-index	0.01

Visualization

Abstract

The construction of a text classifier usually involves (i) a phase of term selection, in which the most relevant terms for the classification task are identified, (ii) a phase of term weighting, in which document weights for the selected terms are computed, and (iii) a phase of classifier learning, in which a classifier is generated from the weighted representations of the training documents. This process involves an activity of supervised learning, in which information on the membership of training documents in categories is used. Traditionally, supervised learning enters only phases (i) and (iii). In this paper we propose instead that learning from training data should also affect phase (ii), i.e. that information on the membership of training documents to categories be used to determine term weights. We call this idea supervised term weighting (STW). As an example, we propose a number of "supervised variants" of t f idf weighting, obtained by replacing the idf function with the function that has been used in phase (i) for term selection. We present experimental results obtained on the standard Reuters-21578 benchmark with one classifier learning method (support vector machines), three term selection functions (information gain, chi-square, and gain ratio), and both local and global term selection and weighting.