A probabilistic model of redundancy in information extraction

Authors:
Doug Downey;Oren Etzioni;Stephen Soderland
Affiliations:
Department of Computer Science and Engineering, University of Washington, Seattle, WA;Department of Computer Science and Engineering, University of Washington, Seattle, WA;Department of Computer Science and Engineering, University of Washington, Seattle, WA
Venue:
IJCAI'05 Proceedings of the 19th international joint conference on Artificial intelligence
Year:
2005

Citing 6
Cited 62

Learning dictionaries for information extraction by multi-level bootstrapping

AAAI '99/IAAI '99 Proceedings of the sixteenth national conference on Artificial intelligence and the eleventh Innovative applications of artificial intelligence conference innovative applications of artificial intelligence
Snowball: extracting relations from large plain-text collections

DL '00 Proceedings of the fifth ACM conference on Digital libraries
Web-scale information extraction in knowitall: (preliminary results)

Proceedings of the 13th international conference on World Wide Web
Is it the right answer?: exploiting web redundancy for Answer Validation

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Unsupervised named-entity extraction from the web: an experimental study

Artificial Intelligence
Confidence estimation for information extraction

HLT-NAACL-Short '04 Proceedings of HLT-NAACL 2004: Short Papers

Adapting Web information extraction knowledge via mining site-invariant and site-dependent features

ACM Transactions on Internet Technology (TOIT)
Espresso: leveraging generic patterns for automatically harvesting semantic relations

ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
KnowItNow: fast, scalable information extraction from the web

HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
Ontologies as facilitators for repurposing web documents

International Journal of Human-Computer Studies
A survey of trust in computer science and the Semantic Web

Web Semantics: Science, Services and Agents on the World Wide Web
A redundancy-based method for the extraction of relation instances from the Web

International Journal of Human-Computer Studies
Machine reading of web text

Proceedings of the 4th international conference on Knowledge capture
Strategies for lifelong knowledge extraction from the web

Proceedings of the 4th international conference on Knowledge capture
Autonomously semantifying wikipedia

Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
Semantic verification in an online fact seeking environment

Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
Entity categorization over large document collections

Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Open information extraction from the web

Communications of the ACM - Surviving the data deluge
Ontology-driven, unsupervised instance population

Web Semantics: Science, Services and Agents on the World Wide Web
Information Extraction

Foundations and Trends in Databases
A quality-aware optimizer for information extraction

ACM Transactions on Database Systems (TODS)
Building query optimizers for information extraction: the SQoUT project

ACM SIGMOD Record
Automatically Harvesting and Ontologizing Semantic Relations

Proceedings of the 2008 conference on Ontology Learning and Population: Bridging the Gap between Text and Knowledge
Exploring models for semantic category verification

Information Systems
Exploring models for semantic category verification

Information Systems
Machine reading

AAAI'06 proceedings of the 21st national conference on Artificial intelligence - Volume 2
It's a contradiction---no, it's not: a case study using functional relations

EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Harvesting relations from the web: quantifiying the impact of filtering functions

AAAI'07 Proceedings of the 22nd national conference on Artificial intelligence - Volume 2
Unsupervised methods for determining object and relation synonyms on the web

Journal of Artificial Intelligence Research
Open information extraction from the web

IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
Identifying interesting assertions from the web

Proceedings of the 18th ACM conference on Information and knowledge management
A metric-based framework for automatic taxonomy induction

ACL '09 Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 1 - Volume 1
Automatic Construction of a Semantic, Domain-Independent Knowledge Base

OTM '09 Proceedings of the Confederated International Workshops and Posters on On the Move to Meaningful Internet Systems: ADI, CAMS, EI2N, ISDE, IWSSA, MONET, OnToContent, ODIS, ORM, OTM Academy, SWWS, SEMELS, Beyond SAWSDL, and COMBEK 2009
Reading to learn: constructing features from semantic abstracts

EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 2 - Volume 2
Quantifier scope disambiguation using extracted pragmatic knowledge: preliminary results

EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 3 - Volume 3
Corroborating information from disagreeing views

Proceedings of the third ACM international conference on Web search and data mining
Creating a dead poets society: extracting a social network of historical persons from the web

ISWC'07/ASWC'07 Proceedings of the 6th international The semantic web and 2nd Asian conference on Asian semantic web conference
I4E: interactive investigation of iterative information extraction

Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Analysis of a probabilistic model of redundancy in unsupervised information extraction

Artificial Intelligence
Extracting sequences from the web

ACLShort '10 Proceedings of the ACL 2010 Conference Short Papers
Popularity-guided top-k extraction of entity attributes

Procceedings of the 13th International Workshop on the Web and Databases
Semantic role labeling for open information extraction

FAM-LbR '10 Proceedings of the NAACL HLT 2010 First International Workshop on Formalisms and Methodology for Learning by Reading
Identifying functional relations in web text

EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
FactRank: random walks on a web of facts

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics
Probabilistic models to reconcile complex data from inaccurate data sources

CAiSE'10 Proceedings of the 22nd international conference on Advanced information systems engineering
A framework for corroborating answers from multiple web sources

Information Systems
Materializing multi-relational databases from the web using taxonomic queries

Proceedings of the fourth ACM international conference on Web search and data mining
Challenges from information extraction to information fusion

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics: Posters
Characterizing the uncertainty of web data: models and experiences

Proceedings of the 2011 Joint WICOW/AIRWeb Workshop on Web Quality
Rules of thumb for information acquisition from large and redundant data

ECIR'11 Proceedings of the 33rd European conference on Advances in information retrieval
An analysis of open information extraction based on semantic role labeling

Proceedings of the sixth international conference on Knowledge capture
Recovering semantics of tables on the web

Proceedings of the VLDB Endowment
Grammatical dependency-based relations for term weighting in text classification

PAKDD'11 Proceedings of the 15th Pacific-Asia conference on Advances in knowledge discovery and data mining - Volume Part I
Cross-lingual slot filling from comparable corpora

BUCC '11 Proceedings of the 4th Workshop on Building and Using Comparable Corpora: Comparable Corpora and the Web
Towards semantic category verification with arbitrary precision

ICTIR'11 Proceedings of the Third international conference on Advances in information retrieval theory
Using the web to validate lexico-semantic relations

EPIA'11 Proceedings of the 15th Portugese conference on Progress in artificial intelligence
Ontology-driven information extraction with ontosyphon

ISWC'06 Proceedings of the 5th international conference on The Semantic Web
Identifying relations for open information extraction

EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Probase: a probabilistic taxonomy for text understanding

SIGMOD '12 Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data
Collaboratively built semi-structured content and Artificial Intelligence: The story so far

Artificial Intelligence
A relation extraction method of Chinese named entities based on location and semantic features

Applied Intelligence
Web data reconciliation: models and experiences

Search Computing
A new term ranking method based on relation extraction and graph model for text classification

ACSC '11 Proceedings of the Thirty-Fourth Australasian Computer Science Conference - Volume 113
Exploiting unstructured web information for managing linked data spaces

Proceedings of the 17th Panhellenic Conference on Informatics
Assessing sparse information extraction using semantic contexts

Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Reporting bias and knowledge acquisition

Proceedings of the 2013 workshop on Automated knowledge base construction
A survey of noise reduction methods for distant supervision

Proceedings of the 2013 workshop on Automated knowledge base construction
Aggregated search: A new information retrieval paradigm

ACM Computing Surveys (CSUR)

Quantified Score

Hi-index	0.00

Visualization

Abstract

Unsupervised Information Extraction (UIE) is the task of extracting knowledge from text without using hand-tagged training examples. A fundamental problem for both UIE and supervised IE is assessing the probability that extracted information is correct. In massive corpora such as the Web, the same extraction is found repeatedly in different documents. How does this redundancy impact the probability of correctness? This paper introduces a combinatorial "balls-andurns" model that computes the impact of sample size, redundancy, and corroboration from multiple distinct extraction rules on the probability that an extraction is correct. We describe methods for estimating the model's parameters in practice and demonstrate experimentally that for UIE the model's log likelihoods are 15 times better, on average, than those obtained by Pointwise Mutual Information (PMI) and the noisy-or model used in previous work. For supervised IE, the model's performance is comparable to that of Support Vector Machines, and Logistic Regression.