Automatic rule refinement for information extraction

Authors:
Bin Liu;Laura Chiticariu;Vivian Chu;H. V. Jagadish;Frederick R. Reiss
Affiliations:
University of Michigan;IBM Research - Almaden;IBM Research - Almaden;University of Michigan;IBM Research - Almaden
Venue:
Proceedings of the VLDB Endowment
Year:
2010

Citing 28
Cited 12

Snowball: extracting relations from large plain-text collections

DL '00 Proceedings of the fifth ACM conference on Digital libraries
Programming pearls: algorithm design techniques

Communications of the ACM
Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Multistrategy Learning for Information Extraction

ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
Active Learning for Natural Language Parsing and Information Extraction

ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
Learning Text Analysis Rules for Domain-specific Natural Language Processing

Learning Text Analysis Rules for Domain-specific Natural Language Processing
UMass/Hughes: description of the CIRCUS system used for MUC-5

MUC5 '93 Proceedings of the 5th conference on Message understanding
FASTUS: a system for extracting information from text

HLT '93 Proceedings of the workshop on Human Language Technology
The common pattern specification language

TIPSTER '98 Proceedings of a workshop on held at Baltimore, Maryland: October 13-15, 1998
Introduction to the CoNLL-2003 shared task: language-independent named entity recognition

CONLL '03 Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003 - Volume 4
Declarative Querying for Biological Sequences

ICDE '06 Proceedings of the 22nd International Conference on Data Engineering
Extracting relations with integrated information using kernel methods

ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
Provenance semirings

Proceedings of the twenty-sixth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Declarative information extraction using datalog with embedded extraction predicates

VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Toward best-effort information extraction

Proceedings of the 2008 ACM SIGMOD international conference on Management of data
On the provenance of non-answers to queries over extracted data

Proceedings of the VLDB Endowment
SystemT: a system for declarative information extraction

ACM SIGMOD Record
An Algebraic Approach to Rule-Based Information Extraction

ICDE '08 Proceedings of the 2008 IEEE 24th International Conference on Data Engineering
Perm: Processing Provenance and Data on the Same Data Model through Query Rewriting

ICDE '09 Proceedings of the 2009 IEEE International Conference on Data Engineering
Join Optimization of Information Extraction Output: Quality Matters!

ICDE '09 Proceedings of the 2009 IEEE International Conference on Data Engineering
Why not?

Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
Provenance in Databases: Why, How, and Where

Foundations and Trends in Databases
XAR: An Integrated Framework for Information Extraction

CSIE '09 Proceedings of the 2009 WRI World Congress on Computer Science and Information Engineering - Volume 04
Regular expression learning for information extraction

EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
TextRunner: open information extraction on the web

NAACL-Demonstrations '07 Proceedings of Human Language Technologies: The Annual Conference of the North American Chapter of the Association for Computational Linguistics: Demonstrations
I4E: interactive investigation of iterative information extraction

Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
SystemT: an algebraic approach to declarative information extraction

ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
Explaining missing answers to SPJUA queries

Proceedings of the VLDB Endowment

Domain adaptation of rule-based annotators for named-entity recognition tasks

EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
Maximizing conjunctive views in deletion propagation

Proceedings of the thirtieth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
The SystemT IDE: an integrated development environment for information extraction rules

Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
Maximizing Conjunctive Views in Deletion Propagation

ACM Transactions on Database Systems (TODS)
WizIE: a best practices guided development environment for information extraction

ACL '12 Proceedings of the ACL 2012 System Demonstrations
Towards efficient named-entity rule induction for customizability

EMNLP-CoNLL '12 Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning
User feedback based query refinement by exploiting skyline operator

ER'12 Proceedings of the 31st international conference on Conceptual Modeling
Spanners: a formal framework for information extraction

Proceedings of the 32nd symposium on Principles of database systems
Provenance-based dictionary refinement in information extraction

Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data
I can do text analytics!: designing development tools for novice developers

Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
A framework for query refinement with user feedback

Journal of Systems and Software
On modeling query refinement by capturing user intent through feedback

ADC '12 Proceedings of the Twenty-Third Australasian Database Conference - Volume 124

Quantified Score

Hi-index	0.00

Visualization

Abstract

Rule-based information extraction from text is increasingly being used to populate databases and to support structured queries on unstructured text. Specification of suitable information extraction rules requires considerable skill and standard practice is to refine rules iteratively, with substantial effort. In this paper, we show that techniques developed in the context of data provenance, to determine the lineage of a tuple in a database, can be leveraged to assist in rule refinement. Specifically, given a set of extraction rules and correct and incorrect extracted data, we have developed a technique to suggest a ranked list of rule modifications that an expert rule specifier can consider. We implemented our technique in the SystemT information extraction system developed at IBM Research -- Almaden and experimentally demonstrate its effectiveness.