Inducing Features of Random Fields
IEEE Transactions on Pattern Analysis and Machine Intelligence
Relational learning of pattern-match rules for information extraction
AAAI '99/IAAI '99 Proceedings of the sixteenth national conference on Artificial intelligence and the eleventh Innovative applications of artificial intelligence conference innovative applications of artificial intelligence
The maximum entropy approach and probabilistic IR models
ACM Transactions on Information Systems (TOIS)
Converting numerical classification into text classification
Artificial Intelligence
Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data
ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Improving machine learning approaches to coreference resolution
ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Thumbs up or thumbs down?: semantic orientation applied to unsupervised classification of reviews
ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Thumbs up?: sentiment classification using machine learning techniques
EMNLP '02 Proceedings of the ACL-02 conference on Empirical methods in natural language processing - Volume 10
A comparison of algorithms for maximum entropy parameter estimation
COLING-02 proceedings of the 6th conference on Natural language learning - Volume 20
The common pattern specification language
TIPSTER '98 Proceedings of a workshop on held at Baltimore, Maryland: October 13-15, 1998
Creating probabilistic databases from information extraction models
VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Consistent selectivity estimation via maximum entropy
The VLDB Journal — The International Journal on Very Large Data Bases
Introduction to information extraction
AI Communications
ACLdemo '04 Proceedings of the ACL 2004 on Interactive poster and demonstration sessions
Extracting personal names from email: applying named entity recognition to informal text
HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
Piecewise pseudolikelihood for efficient training of conditional random fields
Proceedings of the 24th international conference on Machine learning
Efficient query evaluation on probabilistic databases
VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Declarative information extraction using datalog with embedded extraction predicates
VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Databases with uncertainty and lineage
The VLDB Journal — The International Journal on Very Large Data Bases
MCDB: a monte carlo approach to managing uncertain data
Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Open information extraction from the web
Communications of the ACM - Surviving the data deluge
BayesStore: managing large, uncertain data repositories with probabilistic graphical models
Proceedings of the VLDB Endowment
Exploiting shared correlations in probabilistic databases
Proceedings of the VLDB Endowment
SystemT: a system for declarative information extraction
ACM SIGMOD Record
An Algebraic Approach to Rule-Based Information Extraction
ICDE '08 Proceedings of the 2008 IEEE 24th International Conference on Data Engineering
Fast and Simple Relational Processing of Uncertain Data
ICDE '08 Proceedings of the 2008 IEEE 24th International Conference on Data Engineering
Joint unsupervised coreference resolution with Markov logic
EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Joint inference in information extraction
AAAI'07 Proceedings of the 22nd national conference on Artificial intelligence - Volume 1
Bayesian information extraction network
IJCAI'03 Proceedings of the 18th international joint conference on Artificial intelligence
Hierarchical hidden Markov models for information extraction
IJCAI'03 Proceedings of the 18th international joint conference on Artificial intelligence
Using decision trees for conference resolution
IJCAI'95 Proceedings of the 14th international joint conference on Artificial intelligence - Volume 2
Incremental information extraction using tree-based context representations
CICLing'05 Proceedings of the 6th international conference on Computational Linguistics and Intelligent Text Processing
An overview and classification of adaptive approaches to information extraction
Journal on Data Semantics IV
From information to knowledge: harvesting entities and relationships from web sources
Proceedings of the twenty-ninth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Lineage processing over correlated probabilistic databases
Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Querying probabilistic information extraction
Proceedings of the VLDB Endowment
Service-oriented information extraction
Proceedings of the 2011 Joint EDBT/ICDT Ph.D. Workshop
Hybrid in-database inference for declarative information extraction
Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
Querying uncertain data with aggregate constraints
Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
The monte carlo database system: Stochastic analysis close to the data
ACM Transactions on Database Systems (TODS)
Database foundations for scalable RDF processing
RW'11 Proceedings of the 7th international conference on Reasoning web: semantic technologies for the web of data
Expert Systems with Applications: An International Journal
Aggregating semantic annotators
Proceedings of the VLDB Endowment
Hi-index | 0.00 |
Rule-based information extraction is a process by which structured objects are extracted from text based on user-defined rules. The compositional nature of rule-based information extraction also allows rules to be expressed over previously extracted objects. Such extraction is inherently uncertain, due to the varying precision associated with the rules used in a specific extraction task. Quantifying this uncertainty is crucial for querying the extracted objects in probabilistic databases, and for improving the recall of extraction tasks that use compositional rules. In this paper, we provide a probabilistic framework for handling the uncertainty in rule-based information extraction. Specifically, for each extraction task, we build a parametric exponential model of uncertainty that captures the interaction between the different rules, as well as the compositional nature of the rules; the exponential form of our model follows from maximum-entropy considerations. We also give model-decomposition techniques that make the learning algorithms scalable to large numbers of rules and constraints. Experiments over multiple real-world extraction tasks confirm that our approach yields accurate probability estimates with only a small performance overhead. Moreover, our framework supports incremental pay-as-you-go improvements in the accuracy of probability estimates as new rules, data, or constraints are added.