Cascading use of soft and hard matching pattern rules for weakly supervised information extraction

Authors:
Jing Xiao;Tat-Seng Chua;Hang Cui
Affiliations:
National University of Singapore;National University of Singapore;National University of Singapore
Venue:
COLING '04 Proceedings of the 20th international conference on Computational Linguistics
Year:
2004

Citing 11
Cited 7

Combining labeled and unlabeled data with co-training

COLT' 98 Proceedings of the eleventh annual conference on Computational learning theory
Learning Information Extraction Rules for Semi-Structured and Free Text

Machine Learning - Special issue on natural language learning
Learning dictionaries for information extraction by multi-level bootstrapping

AAAI '99/IAAI '99 Proceedings of the sixteenth national conference on Artificial intelligence and the eleventh Innovative applications of artificial intelligence conference innovative applications of artificial intelligence
Learning pattern rules for Chinese named entity extraction

Eighteenth national conference on Artificial intelligence
A Global Rule Induction Approach to Information Extraction

ICTAI '03 Proceedings of the 15th IEEE International Conference on Tools with Artificial Intelligence
Unsupervised learning of soft patterns for generating definitions from online news

Proceedings of the 13th international conference on World Wide Web
A bootstrapping approach to named entity classification using successive learners

ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1
Counter-training in discovery of semantic patterns

ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1
Mining soft-matching rules from textual data

IJCAI'01 Proceedings of the 17th international joint conference on Artificial intelligence - Volume 2
Adaptive information extraction from text by rule induction and generalisation

IJCAI'01 Proceedings of the 17th international joint conference on Artificial intelligence - Volume 2
Automatically generating extraction patterns from untagged text

AAAI'96 Proceedings of the thirteenth national conference on Artificial intelligence - Volume 2

Generic soft pattern models for definitional question answering

Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
Soft pattern matching models for definitional question answering

ACM Transactions on Information Systems (TOIS)
ARE: instance splitting strategies for dependency relation-based information extraction

COLING-ACL '06 Proceedings of the COLING/ACL on Main conference poster sessions
A local tree alignment-based soft pattern matching approach for information extraction

NAACL-Short '09 Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Companion Volume: Short Papers
Combining relations for information extraction from free text

ACM Transactions on Information Systems (TOIS)
Template-based information extraction without the templates

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
Multi event extraction guided by global constraints

NAACL HLT '12 Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

Quantified Score

Hi-index	0.00

Visualization

Abstract

Current rule induction techniques based on hard matching (i.e., strict slot-by-slot matching) tend to fare poorly in extracting information from natural language texts, which often exhibit great variations. The reason is that hard matching techniques result in relatively high precision but low recall. To tackle this problem, we take advantage of the newly proposed soft pattern rules which offer high recall through the use of probabilistic matching. We propose a bootstrapping framework in which soft and hard matching pattern rules are combined in a cascading manner to realize a weakly supervised rule induction scheme. The system starts with a small set of hand-tagged instances. At each iteration, we first generate soft pattern rules and utilize them to tag new training instances automatically. We then apply hard pattern rule induction on the overall tagged data to generate more precise rules, which are used to tag the data again. The process can be repeated until satisfactory results are obtained. Our experimental results show that our bootstrapping scheme with two cascaded learners approaches the performance of a fully supervised information extraction system while using much fewer hand-tagged instances.