Snowball: extracting relations from large plain-text collections
DL '00 Proceedings of the fifth ACM conference on Digital libraries
Programming pearls: algorithm design techniques
Communications of the ACM
Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data
ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Multistrategy Learning for Information Extraction
ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
Active Learning for Natural Language Parsing and Information Extraction
ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
Learning Text Analysis Rules for Domain-specific Natural Language Processing
Learning Text Analysis Rules for Domain-specific Natural Language Processing
UMass/Hughes: description of the CIRCUS system used for MUC-5
MUC5 '93 Proceedings of the 5th conference on Message understanding
FASTUS: a system for extracting information from text
HLT '93 Proceedings of the workshop on Human Language Technology
The common pattern specification language
TIPSTER '98 Proceedings of a workshop on held at Baltimore, Maryland: October 13-15, 1998
Introduction to the CoNLL-2003 shared task: language-independent named entity recognition
CONLL '03 Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003 - Volume 4
Declarative Querying for Biological Sequences
ICDE '06 Proceedings of the 22nd International Conference on Data Engineering
Extracting relations with integrated information using kernel methods
ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
Proceedings of the twenty-sixth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Declarative information extraction using datalog with embedded extraction predicates
VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Toward best-effort information extraction
Proceedings of the 2008 ACM SIGMOD international conference on Management of data
On the provenance of non-answers to queries over extracted data
Proceedings of the VLDB Endowment
SystemT: a system for declarative information extraction
ACM SIGMOD Record
An Algebraic Approach to Rule-Based Information Extraction
ICDE '08 Proceedings of the 2008 IEEE 24th International Conference on Data Engineering
Perm: Processing Provenance and Data on the Same Data Model through Query Rewriting
ICDE '09 Proceedings of the 2009 IEEE International Conference on Data Engineering
Join Optimization of Information Extraction Output: Quality Matters!
ICDE '09 Proceedings of the 2009 IEEE International Conference on Data Engineering
Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
Provenance in Databases: Why, How, and Where
Foundations and Trends in Databases
XAR: An Integrated Framework for Information Extraction
CSIE '09 Proceedings of the 2009 WRI World Congress on Computer Science and Information Engineering - Volume 04
Regular expression learning for information extraction
EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
TextRunner: open information extraction on the web
NAACL-Demonstrations '07 Proceedings of Human Language Technologies: The Annual Conference of the North American Chapter of the Association for Computational Linguistics: Demonstrations
I4E: interactive investigation of iterative information extraction
Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
SystemT: an algebraic approach to declarative information extraction
ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
Explaining missing answers to SPJUA queries
Proceedings of the VLDB Endowment
Domain adaptation of rule-based annotators for named-entity recognition tasks
EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
Maximizing conjunctive views in deletion propagation
Proceedings of the thirtieth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
The SystemT IDE: an integrated development environment for information extraction rules
Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
Maximizing Conjunctive Views in Deletion Propagation
ACM Transactions on Database Systems (TODS)
WizIE: a best practices guided development environment for information extraction
ACL '12 Proceedings of the ACL 2012 System Demonstrations
Towards efficient named-entity rule induction for customizability
EMNLP-CoNLL '12 Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning
User feedback based query refinement by exploiting skyline operator
ER'12 Proceedings of the 31st international conference on Conceptual Modeling
Spanners: a formal framework for information extraction
Proceedings of the 32nd symposium on Principles of database systems
Provenance-based dictionary refinement in information extraction
Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data
I can do text analytics!: designing development tools for novice developers
Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
A framework for query refinement with user feedback
Journal of Systems and Software
On modeling query refinement by capturing user intent through feedback
ADC '12 Proceedings of the Twenty-Third Australasian Database Conference - Volume 124
Hi-index | 0.00 |
Rule-based information extraction from text is increasingly being used to populate databases and to support structured queries on unstructured text. Specification of suitable information extraction rules requires considerable skill and standard practice is to refine rules iteratively, with substantial effort. In this paper, we show that techniques developed in the context of data provenance, to determine the lineage of a tuple in a database, can be leveraged to assist in rule refinement. Specifically, given a set of extraction rules and correct and incorrect extracted data, we have developed a technique to suggest a ranked list of rule modifications that an expert rule specifier can consider. We implemented our technique in the SystemT information extraction system developed at IBM Research -- Almaden and experimentally demonstrate its effectiveness.