Context-sensitive learning methods for text categorization

Authors:
William W. Cohen;Yoram Singer
Affiliations:
AT&T Labs;AT&T Labs
Venue:
ACM Transactions on Information Systems (TOIS)
Year:
1999

Citing 28
Cited 73

Boolean Feature Discovery in Empirical Learning

Machine Learning
Aggregating strategies

COLT '90 Proceedings of the third annual workshop on Computational learning theory
Editorial: Advice to Machine Learning Authors

Machine Learning
Learning boolean functions in an infinite attribute space

STOC '90 Proceedings of the twenty-second annual ACM symposium on Theory of computing
An evaluation of phrasal and clustered representations on a text categorization task

SIGIR '92 Proceedings of the 15th annual international ACM SIGIR conference on Research and development in information retrieval
Representation and learning in information retrieval

Representation and learning in information retrieval
How to use expert advice

STOC '93 Proceedings of the twenty-fifth annual ACM symposium on Theory of computing
Automated learning of decision rules for text categorization

ACM Transactions on Information Systems (TOIS)
An example-based mapping method for text categorization and retrieval

ACM Transactions on Information Systems (TOIS)
The weighted majority algorithm

Information and Computation
A sequential algorithm for training text classifiers

SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
Expert network: effective and efficient learning from human decisions in text categorization and retrieval

SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
Towards language independent automated learning of text categorization models

SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
The effect of adding relevance information in a relevance feedback environment

SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
A comparison of classifiers and document representations for the routing problem

SIGIR '95 Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval
Optimization of relevance feedback weights

SIGIR '95 Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval
Stemming algorithms: a case study for detailed evaluation

Journal of the American Society for Information Science - Special issue: evaluation of information retrieval systems
Method combination for document filtering

SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval
Training algorithms for linear text classifiers

SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval
Feature selection, perceptron learning, and a usability case study for text categorization

Proceedings of the 20th annual international ACM SIGIR conference on Research and development in information retrieval
Using and combining predictors that specialize

STOC '97 Proceedings of the twenty-ninth annual ACM symposium on Theory of computing
Boosting and Rocchio applied to text filtering

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Information Retrieval

Information Retrieval
Learning Quickly When Irrelevant Attributes Abound: A New Linear-Threshold Algorithm

Machine Learning
A decision-theoretic generalization of on-line learning and an application to boosting

EuroCOLT '95 Proceedings of the Second European Conference on Computational Learning Theory
Learning from Hotlists and Coldlists: Towards a WWW Information Filtering and Seeking Agent

TAI '95 Proceedings of the Seventh International Conference on Tools with Artificial Intelligence
EXPONENTIATED GRADIENT VERSUS GRADIENT DESCENT FOR LINEAR PREDICTORS

EXPONENTIATED GRADIENT VERSUS GRADIENT DESCENT FOR LINEAR PREDICTORS
Learning trees and rules with set-valued features

AAAI'96 Proceedings of the thirteenth national conference on Artificial intelligence - Volume 1

Scalable association-based text classification

Proceedings of the ninth international conference on Information and knowledge management
An improved boosting algorithm and its application to text categorization

Proceedings of the ninth international conference on Information and knowledge management
Automatic categorization of case law

Proceedings of the 8th international conference on Artificial intelligence and law
Machine learning in automated text categorization

ACM Computing Surveys (CSUR)
Classifying text documents by associating terms with text categories

ADC '02 Proceedings of the 13th Australasian database conference - Volume 5
Meaningful term extraction and discriminative term selection in text categorization via unknown-word methodology

ACM Transactions on Asian Language Information Processing (TALIP)
Text classification using ESC-based stochastic decision lists

Information Processing and Management: an International Journal
Potential-Based Algorithms in On-Line Prediction and Game Theory

Machine Learning
Uncertainty-Based Noise Reduction and Term Selection in Text Categorization

Proceedings of the 24th BCS-IRSG European Colloquium on IR Research: Advances in Information Retrieval
Preferred Document Classification for a Highly Inflectional/Derivational Language

AI '02 Proceedings of the 15th Australian Joint Conference on Artificial Intelligence: Advances in Artificial Intelligence
Dynamic Models of Expert Groups to Recommend Web Documents

ECDL '01 Proceedings of the 5th European Conference on Research and Advanced Technology for Digital Libraries
Text Categorization through Multistrategy Learning and Visualization

CICLing '01 Proceedings of the Second International Conference on Computational Linguistics and Intelligent Text Processing
Efficient Text Mining with Optimized Pattern Discovery

CPM '02 Proceedings of the 13th Annual Symposium on Combinatorial Pattern Matching
Uncertainty and term selection in text categorization

International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems
Augmenting Naive Bayes Classifiers with Statistical Language Models

Information Retrieval
Event detection from online news documents for supporting environmental scanning

Decision Support Systems - Special issue: Knowledge management technique
Building semantic perceptron net for topic spotting

ACL '01 Proceedings of the 39th Annual Meeting on Association for Computational Linguistics
Associative text categorization exploiting negated words

Proceedings of the 2006 ACM symposium on Applied computing
Applying lazy learning algorithms to tackle concept drift in spam filtering

Expert Systems with Applications: An International Journal
Fuzzy support vector machine for multi-class text categorization

Information Processing and Management: an International Journal
Automated extraction of behavioural profiles from document usage

BT Technology Journal
SpamHunting: An instance-based reasoning system for spam labelling and filtering

Decision Support Systems
Learning rules with negation for text categorization

Proceedings of the 2007 ACM symposium on Applied computing
Semi-supervised single-label text categorization using centroid-based classifiers

Proceedings of the 2007 ACM symposium on Applied computing
A study of local and global thresholding techniques in text categorization

AusDM '06 Proceedings of the fifth Australasian conference on Data mining and analystics - Volume 61
Text classification using sentential frequent itemsets

Journal of Computer Science and Technology
A tree-projection-based algorithm for multi-label recurrent-item associative-classification rule generation

Data & Knowledge Engineering
A machine learning approach to web page filtering using content and structure analysis

Decision Support Systems
Multilabel text categorization based on a new linear classifier learning method and a category-sensitive refinement method

Expert Systems with Applications: An International Journal
Understanding temporal aspects in document classification

WSDM '08 Proceedings of the 2008 International Conference on Web Search and Data Mining
Context-sensitive cut, copy, and paste

Proceedings of the 2008 C3S2E conference
A general grid-clustering approach

Pattern Recognition Letters
From Anomaly Reports to Cases

ICCBR '07 Proceedings of the 7th international conference on Case-Based Reasoning: Case-Based Reasoning Research and Development
A Genetic Algorithm for Text Classification Rule Induction

ECML PKDD '08 Proceedings of the European conference on Machine Learning and Knowledge Discovery in Databases - Part II
Exploiting temporal contexts in text classification

Proceedings of the 17th ACM conference on Information and knowledge management
PicAChoo: a tool for customizable feature extraction utilizing characteristics of textual data

Proceedings of the 3rd International Conference on Ubiquitous Information Management and Communication
Learning When Concepts Abound

The Journal of Machine Learning Research
Taming wild phrases

ECIR'03 Proceedings of the 25th European conference on IR research
A study on optimal parameter tuning for Rocchio text classifier

ECIR'03 Proceedings of the 25th European conference on IR research
Ranking web documents with dynamic evaluation by expert groups

CAiSE'03 Proceedings of the 15th international conference on Advanced information systems engineering
On the importance of parameter tuning in text categorization

PSI'06 Proceedings of the 6th international Andrei Ershov memorial conference on Perspectives of systems informatics
Supervised and unsupervised learning algorithms for thai web pages identification

PRICAI'00 Proceedings of the 6th Pacific Rim international conference on Artificial intelligence
Architecture of adaptive spam filtering based on machine learning algorithms

ICA3PP'07 Proceedings of the 7th international conference on Algorithms and architectures for parallel processing
Feature reinforcement approach to poly-lingual text categorization

ICADL'07 Proceedings of the 10th international conference on Asian digital libraries: looking back 10 years and forging new frontiers
Temporally-aware algorithms for document classification

Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
ROLEX-SP: Rules of lexical syntactic patterns for free text categorization

Knowledge-Based Systems
Pairwise optimized Rocchio algorithm for text categorization

Pattern Recognition Letters
Row-constained method for documents clustering

ICCOM'06 Proceedings of the 10th WSEAS international conference on Communications
Text categorization based on artificial neural networks

ICONIP'06 Proceedings of the 13th international conference on Neural information processing - Volume Part III
Cross-lingual text categorization: Conquering language boundaries in globalized environments

Information Processing and Management: an International Journal
Text categorization algorithms using semantic approaches, corpus-based thesaurus and WordNet

Expert Systems with Applications: An International Journal
FSKNN: Multi-label text categorization based on fuzzy similarity and k nearest neighbors

Expert Systems with Applications: An International Journal
Oscillating feature subset search algorithm for text categorization

CIARP'06 Proceedings of the 11th Iberoamerican conference on Progress in Pattern Recognition, Image Analysis and Applications
A novel algorithm for text categorization using improved back-propagation neural network

FSKD'06 Proceedings of the Third international conference on Fuzzy Systems and Knowledge Discovery
Automatic indexing of news videos through text classification techniques

ICAPR'05 Proceedings of the Third international conference on Pattern Recognition and Image Analysis - Volume Part II
A propositional approach to textual case indexing

PKDD'05 Proceedings of the 9th European conference on Principles and Practice of Knowledge Discovery in Databases
A hybrid text classification system using sentential frequent itemsets

CIS'05 Proceedings of the 2005 international conference on Computational Intelligence and Security - Volume Part I
A new inductive learning method for multilabel text categorization

IEA/AIE'06 Proceedings of the 19th international conference on Advances in Applied Artificial Intelligence: industrial, Engineering and Other Applications of Applied Intelligent Systems
Use of linguistic features in context-sensitive text classification

ICMLC'05 Proceedings of the 4th international conference on Advances in Machine Learning and Cybernetics
Techniques for improving the performance of naive bayes for text classification

CICLing'05 Proceedings of the 6th international conference on Computational Linguistics and Intelligent Text Processing
From external to internal regret

COLT'05 Proceedings of the 18th annual conference on Learning Theory
Learning from positive and unlabeled documents for retrieval of bacterial protein-protein interaction literature

ISMB/ECCB'09 Proceedings of the 2009 workshop of the BioLink Special Interest Group, international conference on Linking Literature, Information, and Knowledge for Biology
Beyond the bag of words: a text representation for sentence selection

AI'06 Proceedings of the 19th international conference on Advances in Artificial Intelligence: Canadian Society for Computational Studies of Intelligence
Is the contextual information relevant in text clustering by compression?

Expert Systems with Applications: An International Journal
Generating search term variants for text collections with historic spellings

ECIR'06 Proceedings of the 28th European conference on Advances in Information Retrieval
Model fusion-based batch learning with application to oil spills detection

IEA/AIE'12 Proceedings of the 25th international conference on Industrial Engineering and Other Applications of Applied Intelligent Systems: advanced research in applied artificial intelligence
Grindstone4Spam: An optimization toolkit for boosting e-mail classification

Journal of Systems and Software
Predicting primary categories of business listings for local search

Proceedings of the 21st ACM international conference on Information and knowledge management
A multi-tier phishing detection and filtering approach

Journal of Network and Computer Applications
Concept comparison engines: A new frontier of search

Decision Support Systems
Temporal contexts: Effective text classification in evolving document collections

Information Systems
Exploiting poly-lingual documents for improving text categorization effectiveness

Decision Support Systems
CoBAn: A context based model for data leakage prevention

Information Sciences: an International Journal

Quantified Score

Hi-index	0.01

Visualization

Abstract

Two recently implemented machine-learning algorithms, RIPPERand sleeping-experts for phrases, are evaluated on a number of large text categorization problems. These algorithms both construct classifiers that allow the “context” of a word w to affect how (or even whether) the presence or absence of w will contribute to a classification. However, RIPPER and sleeping-experts differ radically in many other respects: differences include different notions as to what constitutes a context, different ways of combining contexts to construct a classifier, different methods to search for a combination of contexts, and different criteria as to what contexts should be included in such a combination. In spite of these differences, both RIPPER and sleeping-experts perform extremely well across a wide variety of categorization problems, generally outperforming previously applied learning methods. We view this result as a confirmation of the usefulness of classifiers that represent contextual information.