Find your advisor: robust knowledge gathering from the web

Authors:
Ndapandula Nakashole;Martin Theobald;Gerhard Weikum
Affiliations:
Max-Planck-Institute für Informatik, Saarbrücken, Germany;Max-Planck-Institute für Informatik, Saarbrücken, Germany;Max-Planck-Institute für Informatik, Saarbrücken, Germany
Venue:
Procceedings of the 13th International Workshop on the Web and Databases
Year:
2010

Citing 25
Cited 3

Mining association rules between sets of items in large databases

SIGMOD '93 Proceedings of the 1993 ACM SIGMOD international conference on Management of data
Snowball: extracting relations from large plain-text collections

DL '00 Proceedings of the fifth ACM conference on Digital libraries
Extracting Patterns and Relations from the World Wide Web

WebDB '98 Selected papers from the International Workshop on The World Wide Web and Databases
Web-scale information extraction in knowitall: (preliminary results)

Proceedings of the 13th international conference on World Wide Web
Automatic acquisition of hyponyms from large text corpora

COLING '92 Proceedings of the 14th conference on Computational linguistics - Volume 2
Markov logic networks

Machine Learning
Building structured web community portals: a top-down, compositional, and incremental approach

VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Declarative information extraction using datalog with embedded extraction predicates

VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Automatically refining the wikipedia infobox ontology

Proceedings of the 17th international conference on World Wide Web
Open information extraction from the web

Communications of the ACM - Surviving the data deluge
YAGO: A Large Ontology from Wikipedia and WordNet

Web Semantics: Science, Services and Agents on the World Wide Web
Information Extraction

Foundations and Trends in Databases
Information extraction challenges in managing unstructured data

ACM SIGMOD Record
Using Wikipedia to bootstrap open information extraction

ACM SIGMOD Record
StatSnowball: a statistical approach to extracting entity relationships

Proceedings of the 18th international conference on World wide web
SOFIE: a self-organizing framework for information extraction

Proceedings of the 18th international conference on World wide web
An Algebraic Approach to Rule-Based Information Extraction

ICDE '08 Proceedings of the 2008 IEEE 24th International Conference on Data Engineering
TextRunner: open information extraction on the web

NAACL-Demonstrations '07 Proceedings of Human Language Technologies: The Annual Conference of the North American Chapter of the Association for Computational Linguistics: Demonstrations
Learning and inference with constraints

AAAI'08 Proceedings of the 23rd national conference on Artificial intelligence - Volume 3
Intelligence in wikipedia

AAAI'08 Proceedings of the 23rd national conference on Artificial intelligence - Volume 3
Coupling semi-supervised learning of categories and relations

SemiSupLearn '09 Proceedings of the NAACL HLT 2009 Workshop on Semi-Supervised Learning for Natural Language Processing
Coupled semi-supervised learning for information extraction

Proceedings of the third ACM international conference on Web search and data mining
Sig.ma: live views on the web of data

Proceedings of the 19th international conference on World wide web
DBpedia: a nucleus for a web of open data

ISWC'07/ASWC'07 Proceedings of the 6th international The semantic web and 2nd Asian conference on Asian semantic web conference
Text2Onto: a framework for ontology learning and data-driven change discovery

NLDB'05 Proceedings of the 10th international conference on Natural Language Processing and Information Systems

Scalable knowledge harvesting with high precision and high recall

Proceedings of the fourth ACM international conference on Web search and data mining
DIDO: a disease-determinants ontology from web sources

Proceedings of the 20th international conference companion on World wide web
Ontology-Based information and event extraction for business intelligence

AIMSA'12 Proceedings of the 15th international conference on Artificial Intelligence: methodology, systems, and applications

Quantified Score

Hi-index	0.00

Visualization

Abstract

We present a robust method for gathering relational facts from the Web, based on matching generalized patterns which are automatically learned from seed facts for relations of interest. Our approach combines these generalized patterns for high recall information extraction with a rule-based, declarative reasoning approach to also ensure high precision. Newly extracted candidate facts are assigned statistical weights which reflect the strengths of the patterns used to extract them. For checking the plausibility of candidate facts with respect to existing knowledge and competing hypotheses, we use an efficient algorithm for weighted Max-Sat over propositional-logic clauses. In contrast to prior work on reasoning-based information extraction, we employ richer statistics and smart pruning to bound the number of grounded rules passed on to the Max-Sat solver.