Confidence estimation for information extraction

Authors:
Aron Culotta;Andrew McCallum
Affiliations:
University of Massachusetts, Amherst, MA;University of Massachusetts, Amherst, MA
Venue:
HLT-NAACL-Short '04 Proceedings of HLT-NAACL 2004: Short Papers
Year:
2004

Citing 7
Cited 28

Probabilistic combination of text classifiers using reliability indicators: models and results

SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Active Learning for Natural Language Parsing and Information Extraction

ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
Discriminative Reranking for Natural Language Parsing

ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Active Hidden Markov Models for Information Extraction

IDA '01 Proceedings of the 4th International Conference on Advances in Intelligent Data Analysis
Confidence estimation for translation prediction

CONLL '03 Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003 - Volume 4
Interactive information extraction with constrained conditional random fields

AAAI'04 Proceedings of the 19th national conference on Artifical intelligence

Learning metadata from the evidence in an on-line citation matching scheme

Proceedings of the 6th ACM/IEEE-CS joint conference on Digital libraries
Confidence estimation for NLP applications

ACM Transactions on Speech and Language Processing (TSLP)
Integrating probabilistic extraction models and data mining to discover relations and patterns in text

HLT-NAACL '06 Proceedings of the main conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics
Towards a SVM-struct Based Active Learning Algorithm for Least Cost Semantic Annotation

WI-IAT '08 Proceedings of the 2008 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology - Volume 03
Cascaded classifiers for confidence-based chemical named entity recognition

BioNLP '08 Proceedings of the Workshop on Current Trends in Biomedical Natural Language Processing
Automatic selection of high quality parses created by a fully unsupervised parser

CoNLL '09 Proceedings of the Thirteenth Conference on Computational Natural Language Learning
Interactive information extraction with constrained conditional random fields

AAAI'04 Proceedings of the 19th national conference on Artifical intelligence
Learning with probabilistic features for improved pipeline models

EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
NER systems that suit user's preferences: adjusting the recall-precision trade-off for entity extraction

NAACL-Short '06 Proceedings of the Human Language Technology Conference of the NAACL, Companion Volume: Short Papers
A simple semi-supervised algorithm for named entity recognition

SemiSupLearn '09 Proceedings of the NAACL HLT 2009 Workshop on Semi-Supervised Learning for Natural Language Processing
Weakly supervised learning methods for improving the quality of gene name normalization data

ISMB '05 Proceedings of the ACL-ISMB Workshop on Linking Biological Literature, Ontologies and Databases: Mining Biological Semantics
Maximum a posteriori path estimation with input trace perturbation: algorithms and application to credible rating of human routines

IJCAI'05 Proceedings of the 19th international joint conference on Artificial intelligence
A probabilistic model of redundancy in information extraction

IJCAI'05 Proceedings of the 19th international joint conference on Artificial intelligence
Corrective feedback and persistent learning for information extraction

Artificial Intelligence
Reducing the annotation effort for letter-to-phoneme conversion

ACL '09 Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 1 - Volume 1
Analysis of a probabilistic model of redundancy in unsupervised information extraction

Artificial Intelligence
Conditional random fields for word hyphenation

ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
Assessment of utility in web mining for the domain of public health

Louhi '10 Proceedings of the NAACL HLT 2010 Second Louhi Workshop on Text and Data Mining of Health Documents
We're not in Kansas anymore: detecting domain changes in streams

EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
Confidence in structured-prediction using confidence-weighted models

EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
Service-oriented information extraction

Proceedings of the 2011 Joint EDBT/ICDT Ph.D. Workshop
Confidence driven unsupervised semantic parsing

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
Combining proper name-coreference with conditional random fields for semi-supervised named entity recognition in Vietnamese text

PAKDD'11 Proceedings of the 15th Pacific-Asia conference on Advances in knowledge discovery and data mining - Volume Part I
Facilitating pattern discovery for relation extraction with semantic-signature-based clustering

Proceedings of the 20th ACM international conference on Information and knowledge management
Semi-supervised training set adaption to unknown countries for traffic sign classifiers

PSL'11 Proceedings of the First IAPR TC3 conference on Partially Supervised Learning
Tuple refinement method based on relationship keyword extension

WISM'12 Proceedings of the 2012 international conference on Web Information Systems and Mining
Assessing sparse information extraction using semantic contexts

Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Assessing confidence of knowledge base content with an experimental study in entity resolution

Proceedings of the 2013 workshop on Automated knowledge base construction

Quantified Score

Hi-index	0.00

Visualization

Abstract

Information extraction techniques automatically create structured databases from unstructured data sources, such as the Web or newswire documents. Despite the successes of these systems, accuracy will always be imperfect. For many reasons, it is highly desirable to accurately estimate the confidence the system has in the correctness of each extracted field. The information extraction system we evaluate is based on a linear-chain conditional random field (CRF), a probabilistic model which has performed well on information extraction tasks because of its ability to capture arbitrary, overlapping features of the input in a Markov model. We implement several techniques to estimate the confidence of both extracted fields and entire multi-field records, obtaining an average precision of 98% for retrieving correct fields and 87% for multi-field records.