Uncertainty management in rule-based information extraction systems

Authors:
Eirinaios Michelakis;Rajasekar Krishnamurthy;Peter J. Haas;Shivakumar Vaithyanathan
Affiliations:
University of California at Berkeley, Berkeley, CA, USA;IBM Almaden Research Center, San Jose, CA, USA;IBM Almaden Research Center, San Jose, CA, USA;IBM Almaden Research Center, San Jose, CA, USA
Venue:
Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
Year:
2009

Citing 33
Cited 10

Inducing Features of Random Fields

IEEE Transactions on Pattern Analysis and Machine Intelligence
Relational learning of pattern-match rules for information extraction

AAAI '99/IAAI '99 Proceedings of the sixteenth national conference on Artificial intelligence and the eleventh Innovative applications of artificial intelligence conference innovative applications of artificial intelligence
The maximum entropy approach and probabilistic IR models

ACM Transactions on Information Systems (TOIS)
Converting numerical classification into text classification

Artificial Intelligence
Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Improving machine learning approaches to coreference resolution

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Thumbs up or thumbs down?: semantic orientation applied to unsupervised classification of reviews

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Thumbs up?: sentiment classification using machine learning techniques

EMNLP '02 Proceedings of the ACL-02 conference on Empirical methods in natural language processing - Volume 10
A comparison of algorithms for maximum entropy parameter estimation

COLING-02 proceedings of the 6th conference on Natural language learning - Volume 20
The common pattern specification language

TIPSTER '98 Proceedings of a workshop on held at Baltimore, Maryland: October 13-15, 1998
Creating probabilistic databases from information extraction models

VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Consistent selectivity estimation via maximum entropy

The VLDB Journal — The International Journal on Very Large Data Bases
Introduction to information extraction

AI Communications
Combining lexical, syntactic, and semantic features with maximum entropy models for extracting relations

ACLdemo '04 Proceedings of the ACL 2004 on Interactive poster and demonstration sessions
Extracting personal names from email: applying named entity recognition to informal text

HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
Piecewise pseudolikelihood for efficient training of conditional random fields

Proceedings of the 24th international conference on Machine learning
Efficient query evaluation on probabilistic databases

VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Declarative information extraction using datalog with embedded extraction predicates

VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Databases with uncertainty and lineage

The VLDB Journal — The International Journal on Very Large Data Bases
MCDB: a monte carlo approach to managing uncertain data

Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Open information extraction from the web

Communications of the ACM - Surviving the data deluge
BayesStore: managing large, uncertain data repositories with probabilistic graphical models

Proceedings of the VLDB Endowment
Exploiting shared correlations in probabilistic databases

Proceedings of the VLDB Endowment
SystemT: a system for declarative information extraction

ACM SIGMOD Record
An Algebraic Approach to Rule-Based Information Extraction

ICDE '08 Proceedings of the 2008 IEEE 24th International Conference on Data Engineering
Fast and Simple Relational Processing of Uncertain Data

ICDE '08 Proceedings of the 2008 IEEE 24th International Conference on Data Engineering
Joint unsupervised coreference resolution with Markov logic

EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Joint inference in information extraction

AAAI'07 Proceedings of the 22nd national conference on Artificial intelligence - Volume 1
Bayesian information extraction network

IJCAI'03 Proceedings of the 18th international joint conference on Artificial intelligence
Hierarchical hidden Markov models for information extraction

IJCAI'03 Proceedings of the 18th international joint conference on Artificial intelligence
Using decision trees for conference resolution

IJCAI'95 Proceedings of the 14th international joint conference on Artificial intelligence - Volume 2
Incremental information extraction using tree-based context representations

CICLing'05 Proceedings of the 6th international conference on Computational Linguistics and Intelligent Text Processing
An overview and classification of adaptive approaches to information extraction

Journal on Data Semantics IV

From information to knowledge: harvesting entities and relationships from web sources

Proceedings of the twenty-ninth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Lineage processing over correlated probabilistic databases

Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Querying probabilistic information extraction

Proceedings of the VLDB Endowment
Service-oriented information extraction

Proceedings of the 2011 Joint EDBT/ICDT Ph.D. Workshop
Hybrid in-database inference for declarative information extraction

Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
Querying uncertain data with aggregate constraints

Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
The monte carlo database system: Stochastic analysis close to the data

ACM Transactions on Database Systems (TODS)
Database foundations for scalable RDF processing

RW'11 Proceedings of the 7th international conference on Reasoning web: semantic technologies for the web of data
Ontology-based information extraction of regulatory networks from scientific articles with case studies for Escherichia coli

Expert Systems with Applications: An International Journal
Aggregating semantic annotators

Proceedings of the VLDB Endowment

Quantified Score

Hi-index	0.00

Visualization

Abstract

Rule-based information extraction is a process by which structured objects are extracted from text based on user-defined rules. The compositional nature of rule-based information extraction also allows rules to be expressed over previously extracted objects. Such extraction is inherently uncertain, due to the varying precision associated with the rules used in a specific extraction task. Quantifying this uncertainty is crucial for querying the extracted objects in probabilistic databases, and for improving the recall of extraction tasks that use compositional rules. In this paper, we provide a probabilistic framework for handling the uncertainty in rule-based information extraction. Specifically, for each extraction task, we build a parametric exponential model of uncertainty that captures the interaction between the different rules, as well as the compositional nature of the rules; the exponential form of our model follows from maximum-entropy considerations. We also give model-decomposition techniques that make the learning algorithms scalable to large numbers of rules and constraints. Experiments over multiple real-world extraction tasks confirm that our approach yields accurate probability estimates with only a small performance overhead. Moreover, our framework supports incremental pay-as-you-go improvements in the accuracy of probability estimates as new rules, data, or constraints are added.