Tackling concept drift by temporal inductive transfer

Authors:
George Forman
Affiliations:
Hewlett-Packard Labs, Palo Alto, CA
Venue:
SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Year:
2006

Citing 13
Cited 20

Learning in the presence of concept drift and hidden contexts

Machine Learning
Distributional clustering of words for text classification

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Fast supervised dimensionality reduction algorithm with applications to document categorization & retrieval

Proceedings of the ninth international conference on Information and knowledge management
Mining time-changing data streams

Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
Text Categorization with Suport Vector Machines: Learning with Many Relevant Features

ECML '98 Proceedings of the 10th European Conference on Machine Learning
An extensive empirical study of feature selection metrics for text classification

The Journal of Machine Learning Research
RCV1: A New Benchmark Collection for Text Categorization Research

The Journal of Machine Learning Research
A Response to Webb and Ting's On the Application of ROC Analysis to Predict Classification Performance Under Varying Class Distributions

Machine Learning
Quantifying trends accurately despite classifier error and class imbalance

Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)

Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)
Learning drifting concepts: Example selection vs. example weighting

Intelligent Data Analysis
Feature generation for text categorization using world knowledge

IJCAI'05 Proceedings of the 19th international joint conference on Artificial intelligence
Counting positives accurately despite inaccurate classification

ECML'05 Proceedings of the 16th European conference on Machine Learning

Improving text classification for oral history archives with temporal domain knowledge

SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
An incremental cluster-based approach to spam filtering

Expert Systems with Applications: An International Journal
Understanding temporal aspects in document classification

WSDM '08 Proceedings of the 2008 International Conference on Web Search and Data Mining
Non-stationary data sequence classification using online class priors estimation

Pattern Recognition
Designing an inductive data stream management system: the stream mill experience

SSPS '08 Proceedings of the 2nd international workshop on Scalable stream processing system
Local likelihood modeling of temporal text streams

Proceedings of the 25th international conference on Machine learning
BNS feature scaling: an improved representation over tf-idf for svm text classification

Proceedings of the 17th ACM conference on Information and knowledge management
Leveraging Web 2.0 Sources for Web Content Classification

WI-IAT '08 Proceedings of the 2008 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology - Volume 01
An Ensemble of Classifiers for coping with Recurring Contexts in Data Streams

Proceedings of the 2008 conference on ECAI 2008: 18th European Conference on Artificial Intelligence
Learning, detecting, understanding, and predicting concept changes

IJCNN'09 Proceedings of the 2009 international joint conference on Neural Networks
Transfer estimation of evolving class priors in data stream classification

Pattern Recognition
Adaptive methods for classification in arbitrarily imbalanced and drifting data streams

PAKDD'09 Proceedings of the 13th Pacific-Asia international conference on Knowledge discovery and data mining: new frontiers in applied data mining
Effective sentiment stream analysis with self-augmenting training and demand-driven projection

Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
k-Information Gain Scaled Nearest Neighbors: A Novel Approach to Classifying Protein-Protein Interaction-Related Documents

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Improving tweet stream classification by detecting changes in word probability

SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
Temporal contexts: Effective text classification in evolving document collections

Information Systems
A survey on concept drift adaptation

ACM Computing Surveys (CSUR)
Research on adaptive classification algorithm based on non-segment and classified-centre-vector

International Journal of Intelligent Information and Database Systems
Research on classification algorithm and its application in cased-based reasoning

International Journal of Computer Applications in Technology
Concept drift detection via competence models

Artificial Intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

Machine learning is the mainstay for text classification. However, even the most successful techniques are defeated by many real-world applications that have a strong time-varying component. To advance research on this challenging but important problem, we promote a natural, experimental framework-the Daily Classification Task-which can be applied to large time-based datasets, such as Reuters RCV1.In this paper we dissect concept drift into three main subtypes. We demonstrate via a novel visualization that the recurrent themes subtype is present in RCV1. This understanding led us to develop a new learning model that transfers induced knowledge through time to benefit future classifier learning tasks. The method avoids two main problems with existing work in inductive transfer: scalability and the risk of negative transfer. In empirical tests, it consistently showed more than 10 points F-measure improvement for each of four Reuters categories tested.