We're not in Kansas anymore: detecting domain changes in streams

Authors:
Mark Dredze;Tim Oates;Christine Piatko
Affiliations:
Human Language Technology Center of Excellence and University of Maryland, Baltimore County;Human Language Technology Center of Excellence and University of Maryland, Baltimore County;Human Language Technology Center of Excellence and Johns Hopkins University
Venue:
EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
Year:
2010

Citing 23
Cited 0

Learning in the presence of concept drift and hidden contexts

Machine Learning
Text genre classification with genre-revealing and subject-revealing features

SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
Detecting Concept Drift with Support Vector Machines

ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Automatic detection of text genre

ACL '98 Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and Eighth Conference of the European Chapter of the Association for Computational Linguistics
The form is the substance: classification of genres in text

HLTKM '01 Proceedings of the workshop on Human Language Technology and Knowledge Management - Volume 2001
Learning to classify documents according to genre: Special Topic Section on Computational Analysis of Style

Journal of the American Society for Information Science and Technology
Data streams: algorithms and applications

Foundations and Trends® in Theoretical Computer Science
Confidence estimation for NLP applications

ACM Transactions on Speech and Language Processing (TSLP)
Online Passive-Aggressive Algorithms

The Journal of Machine Learning Research
Detecting change in data streams

VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Sequential Change Detection on Data Streams

ICDMW '07 Proceedings of the Seventh IEEE International Conference on Data Mining Workshops
Confidence-weighted linear classification

Proceedings of the 25th international conference on Machine learning
Active learning with confidence

HLT-Short '08 Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics on Human Language Technologies: Short Papers
Part-of-speech histograms for genre classification of text

ICASSP '09 Proceedings of the 2009 IEEE International Conference on Acoustics, Speech and Signal Processing
Domain adaptation with structural correspondence learning

EMNLP '06 Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing
Learning with probabilistic features for improved pipeline models

EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Online methods for multi-domain learning and adaptation

EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Confidence estimation for information extraction

HLT-NAACL-Short '04 Proceedings of HLT-NAACL 2004: Short Papers
A theory of learning from different domains

Machine Learning
Detecting concept drift using statistical testing

DS'07 Proceedings of the 10th international conference on Discovery science
Automatic domain adaptation for parsing

HLT '10 Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Streaming first story detection with application to Twitter

HLT '10 Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Domain adaptation meets active learning

ALNLP '10 Proceedings of the NAACL HLT 2010 Workshop on Active Learning for Natural Language Processing

Quantified Score

Hi-index	0.01

Visualization

Abstract

Domain adaptation, the problem of adapting a natural language processing system trained in one domain to perform well in a different domain, has received significant attention. This paper addresses an important problem for deployed systems that has received little attention - detecting when such adaptation is needed by a system operating in the wild, i.e., performing classification over a stream of unlabeled examples. Our method uses A-distance, a metric for detecting shifts in data streams, combined with classification margins to detect domain shifts. We empirically show effective domain shift detection on a variety of data sets and shift conditions.