The YAGO-NAGA approach to knowledge discovery

Authors:
Gjergji Kasneci;Maya Ramanath;Fabian Suchanek;Gerhard Weikum
Affiliations:
Max Planck Institute for Informatics, Saarbruecken, Germany;Max Planck Institute for Informatics, Saarbruecken, Germany;Max Planck Institute for Informatics, Saarbruecken, Germany;Max Planck Institute for Informatics, Saarbruecken, Germany
Venue:
ACM SIGMOD Record
Year:
2009

Citing 18
Cited 15

Unsupervised named-entity extraction from the web: an experimental study

Artificial Intelligence
A risk minimization framework for information retrieval

Information Processing and Management: an International Journal - Special issue: Formal methods for information retrieval
Simultaneous record detection and attribute labeling in web data extraction

Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Combining linguistic and statistical analysis to extract relations from web documents

Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Web object retrieval

Proceedings of the 16th international conference on World Wide Web
Yago: a core of semantic knowledge

Proceedings of the 16th international conference on World Wide Web
Towards a query optimizer for text-centric tasks

ACM Transactions on Database Systems (TODS)
Autonomously semantifying wikipedia

Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
Building structured web community portals: a top-down, compositional, and incremental approach

VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Declarative information extraction using datalog with embedded extraction predicates

VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Automatically refining the wikipedia infobox ontology

Proceedings of the 17th international conference on World Wide Web
YAGO: A Large Ontology from Wikipedia and WordNet

Web Semantics: Science, Services and Agents on the World Wide Web
Information Extraction

Foundations and Trends in Databases
An Algebraic Approach to Rule-Based Information Extraction

ICDE '08 Proceedings of the 2008 IEEE 24th International Conference on Data Engineering
NAGA: Searching and Ranking Knowledge

ICDE '08 Proceedings of the 2008 IEEE 24th International Conference on Data Engineering
STAR: Steiner-Tree Approximation in Relationship Graphs

ICDE '09 Proceedings of the 2009 IEEE International Conference on Data Engineering
Open information extraction from the web

IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
DBpedia: a nucleus for a web of open data

ISWC'07/ASWC'07 Proceedings of the 6th international The semantic web and 2nd Asian conference on Asian semantic web conference

Efficiently incorporating user feedback into information extraction and integration programs

Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
Optimizing complex extraction programs over evolving text data

Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
Automatic Construction of a Semantic, Domain-Independent Knowledge Base

OTM '09 Proceedings of the Confederated International Workshops and Posters on On the Move to Meaningful Internet Systems: ADI, CAMS, EI2N, ISDE, IWSSA, MONET, OnToContent, ODIS, ORM, OTM Academy, SWWS, SEMELS, Beyond SAWSDL, and COMBEK 2009
Geographical classification of documents using evidence from Wikipedia

Proceedings of the 6th Workshop on Geographic Information Retrieval
Crowdsourcing systems on the World-Wide Web

Communications of the ACM
Self-supervised web search for any-k complete tuples

Proceedings of the 2nd International Workshop on Business intelligencE and the WEB
Aletheia: an architecture for semantic federation of product information from structured and unstructured sources

Proceedings of the 12th International Conference on Information Integration and Web-based Applications & Services
Automated feature generation from structured knowledge

Proceedings of the 20th ACM international conference on Information and knowledge management
Enriching short text representation in microblog for clustering

Frontiers of Computer Science in China
Building a large scale knowledge base from chinese wiki encyclopedia

JIST'11 Proceedings of the 2011 joint international conference on The Semantic Web
Pattern learning for relation extraction with a hierarchical topic model

ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Short Papers - Volume 2
Clustering Wikipedia infoboxes to discover their types

Proceedings of the 21st ACM international conference on Information and knowledge management
Robust web data extraction: a novel approach based on minimum cost script edit model

WISM'12 Proceedings of the 2012 international conference on Web Information Systems and Mining
A performance comparison of parallel DBMSs and MapReduce on large-scale text analytics

Proceedings of the 16th International Conference on Extending Database Technology
Beyond search: Retrieving complete tuples from a text-database

Information Systems Frontiers

Quantified Score

Hi-index	0.02

Visualization

Abstract

This paper gives an overview on the YAGO-NAGA approach to information extraction for building a conveniently searchable, large-scale, highly accurate knowledge base of common facts. YAGO harvests infoboxes and category names of Wikipedia for facts about individual entities, and it reconciles these with the taxonomic backbone of WordNet in order to ensure that all entities have proper classes and the class system is consistent. Currently, the YAGO knowledge base contains about 19 million instances of binary relations for about 1.95 million entities. Based on intensive sampling, its accuracy is estimated to be above 95 percent. The paper presents the architecture of the YAGO extractor toolkit, its distinctive approach to consistency checking, its provisions for maintenance and further growth, and the query engine for YAGO, coined NAGA. It also discusses ongoing work on extensions towards integrating fact candidates extracted from natural-language text sources.