Exploiting unannotated corpora for tagging and chunking

Authors:
Rie Kubota Ando
Affiliations:
IBM T.J. Watson Research Center, Hawthorne, NY
Venue:
ACLdemo '04 Proceedings of the ACL 2004 on Interactive poster and demonstration sessions
Year:
2004

Citing 3
Cited 6

Dimensions of meaning

Proceedings of the 1992 ACM/IEEE conference on Supercomputing
Text chunking based on a generalization of winnow

The Journal of Machine Learning Research
Coping with ambiguity and unknown words through probabilistic models

Computational Linguistics - Special issue on using large corpora: II

Domain adaptation with structural correspondence learning

EMNLP '06 Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing
TimeML-compliant text analysis for temporal reasoning

IJCAI'05 Proceedings of the 19th international joint conference on Artificial intelligence
Effective use of TimeBank for TimeML analysis

Proceedings of the 2005 international conference on Annotating, extracting and reasoning about time and events
Adaptive parameters for entity recognition with perceptron HMMs

DANLP 2010 Proceedings of the 2010 Workshop on Domain Adaptation for Natural Language Processing
Explaining conclusions from diverse knowledge sources

ISWC'06 Proceedings of the 5th international conference on The Semantic Web
Labeling by landscaping: classifying tokens in context by pruning and decorating trees

Proceedings of the 21st ACM international conference on Information and knowledge management

Quantified Score

Hi-index	0.00

Visualization

Abstract

We present a method that exploits unannotated corpora for compensating the paucity of annotated training data on the chunking and tagging tasks. It collects and compresses feature frequencies from a large unannotated corpus for use by linear classifiers. Experiments on two tasks show that it consistently produces significant performance improvements.