Knowledge harvesting in the big-data era

Authors:
Fabian Suchanek;Gerhard Weikum
Affiliations:
Max Planck Institute for Informatics, Saarbruecken, Germany;Max Planck Institute for Informatics, Saarbruecken, Germany
Venue:
Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data
Year:
2013

Citing 104
Cited 0

CYC: a large-scale investment in knowledge infrastructure

Communications of the ACM
Snowball: extracting relations from large plain-text collections

DL '00 Proceedings of the fifth ACM conference on Digital libraries
Extracting Patterns and Relations from the World Wide Web

WebDB '98 Selected papers from the International Workshop on The World Wide Web and Databases
Similarity Flooding: A Versatile Graph Matching Algorithm and Its Application to Schema Matching

ICDE '02 Proceedings of the 18th International Conference on Data Engineering
Automatic acquisition of hyponyms from large text corpora

COLING '92 Proceedings of the 14th conference on Computational linguistics - Volume 2
The Lixto data extraction project: back and forth between theory and practice

PODS '04 Proceedings of the twenty-third ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Managing information extraction: state of the art and research directions

Proceedings of the 2006 ACM SIGMOD international conference on Management of data
Duplicate Record Detection: A Survey

IEEE Transactions on Knowledge and Data Engineering
Entity Resolution with Markov Logic

ICDM '06 Proceedings of the Sixth International Conference on Data Mining
Collective entity resolution in relational data

ACM Transactions on Knowledge Discovery from Data (TKDD)
Incorporating non-local information into information extraction systems by Gibbs sampling

ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
Automating temporal annotation with TARSQI

ACLdemo '05 Proceedings of the ACL 2005 on Interactive poster and demonstration sessions
Web object retrieval

Proceedings of the 16th international conference on World Wide Web
Yago: a core of semantic knowledge

Proceedings of the 16th international conference on World Wide Web
Introduction to Statistical Relational Learning (Adaptive Computation and Machine Learning)

Introduction to Statistical Relational Learning (Adaptive Computation and Machine Learning)
Automatically refining the wikipedia infobox ontology

Proceedings of the 17th international conference on World Wide Web
Freebase: a collaboratively created graph database for structuring human knowledge

Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Unsupervised deduplication using cross-field dependencies

Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Language-Independent Set Expansion of Named Entities Using the Web

ICDM '07 Proceedings of the 2007 Seventh IEEE International Conference on Data Mining
Learning to link with wikipedia

Proceedings of the 17th ACM conference on Information and knowledge management
Word sense disambiguation: A survey

ACM Computing Surveys (CSUR)
Information Extraction

Foundations and Trends in Databases
StatSnowball: a statistical approach to extracting entity relationships

Proceedings of the 18th international conference on World wide web
SOFIE: a self-organizing framework for information extraction

Proceedings of the 18th international conference on World wide web
Collective annotation of Wikipedia entities in web text

Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
RiMOM: A Dynamic Multistrategy Ontology Alignment Framework

IEEE Transactions on Knowledge and Data Engineering
Searching for common sense: populating Cyc™ from the web

AAAI'05 Proceedings of the 20th national conference on Artificial intelligence - Volume 3
Deriving a large scale taxonomy from Wikipedia

AAAI'07 Proceedings of the 22nd national conference on Artificial intelligence - Volume 2
Open information extraction from the web

IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
Unsupervised named-entity extraction from the Web: An experimental study

Artificial Intelligence
Towards a universal wordnet by learning from combined evidence

Proceedings of the 18th ACM conference on Information and knowledge management
Handbook on Ontologies

Handbook on Ontologies
Frameworks for entity matching: A comparison

Data & Knowledge Engineering
Gathering and ranking photos of named entities with high precision, high recall, and diversity

Proceedings of the third ACM international conference on Web search and data mining
Timely YAGO: harvesting, querying, and visualizing temporal knowledge from Wikipedia

Proceedings of the 13th International Conference on Extending Database Technology
Relational duality: unsupervised extraction of semantic relations between entities on the web

Proceedings of the 19th international conference on World wide web
DBpedia: a nucleus for a web of open data

ISWC'07/ASWC'07 Proceedings of the 6th international The semantic web and 2nd Asian conference on Asian semantic web conference
Probabilistic Graphical Models: Principles and Techniques - Adaptive Computation and Machine Learning

Probabilistic Graphical Models: Principles and Techniques - Adaptive Computation and Machine Learning
From information to knowledge: harvesting entities and relationships from web sources

Proceedings of the twenty-ninth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Semantic Role Labeling

Semantic Role Labeling
Acquisition of instance attributes via labeled and related instances

Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
An Introduction to Duplicate Detection

An Introduction to Duplicate Detection
Markov Logic: An Interface Layer for Artificial Intelligence

Markov Logic: An Interface Layer for Artificial Intelligence
SystemT: an algebraic approach to declarative information extraction

ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
BabelNet: building a very large multilingual semantic network

ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
Learning 5000 relational extractors

ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
Experiments in graph-based semi-supervised learning methods for class-instance acquisition

ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
A semi-supervised method to learn and construct taxonomies using the web

EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
MENTA: inducing multilingual taxonomies from wikipedia

CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
TAGME: on-the-fly annotation of short text fragments (by wikipedia entities)

CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
Entity disambiguation for knowledge base population

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics
Modeling relations and their mentions without labeled text

ECML PKDD'10 Proceedings of the 2010 European conference on Machine learning and knowledge discovery in databases: Part III
Evaluation of entity resolution approaches on real-world match problems

Proceedings of the VLDB Endowment
Annotating and searching web tables using entities, types and relationships

Proceedings of the VLDB Endowment
Joint training for open-domain extraction on the web: exploiting overlap when supervision is limited

Proceedings of the fourth ACM international conference on Web search and data mining
Scalable knowledge harvesting with high precision and high recall

Proceedings of the fourth ACM international conference on Web search and data mining
Collective extraction from heterogeneous web lists

Proceedings of the fourth ACM international conference on Web search and data mining
Searching patterns for relation extraction over the web: rediscovering the pattern-relation duality

Proceedings of the fourth ACM international conference on Web search and data mining
Large-scale collective entity matching

Proceedings of the VLDB Endowment
Tuffy: scaling up statistical inference in Markov logic networks using an RDBMS

Proceedings of the VLDB Endowment
Taxonomy induction based on a collaboratively built knowledge repository

Artificial Intelligence
Linked Data

Linked Data
Large-scale cross-document coreference using distributed inference and hierarchical models

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
Coreference resolution with world knowledge

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
Local and global algorithms for disambiguation to Wikipedia

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
Ranking class labels using query sessions

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
Recovering semantics of tables on the web

Proceedings of the VLDB Endowment
Collective entity linking in web text: a graph-based method

Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
Finding images of difficult entities in the long tail

Proceedings of the 20th ACM international conference on Information and knowledge management
Harvesting facts from textual web sources by constrained label propagation

Proceedings of the 20th ACM international conference on Information and knowledge management
Multilingual schema matching for Wikipedia infoboxes

Proceedings of the VLDB Endowment
PARIS: probabilistic alignment of relations, instances, and schema

Proceedings of the VLDB Endowment
Coupled temporal scoping of relational facts

Proceedings of the fifth ACM international conference on Web search and data mining
WebSets: extracting sets of entities from the web using unsupervised information extraction

Proceedings of the fifth ACM international conference on Web search and data mining
Scalable and distributed methods for entity matching, consolidation and disambiguation over linked data corpora

Web Semantics: Science, Services and Agents on the World Wide Web
Random walk inference and learning in a large scale knowledge base

EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Robust disambiguation of named entities in text

EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Relation extraction with relation topics

EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Discovering relations between noun categories

EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Identifying relations for open information extraction

EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Extraction of temporal facts and events from Wikipedia

Proceedings of the 2nd Temporal Web Analytics Workshop
Factorizing YAGO: scalable machine learning for linked data

Proceedings of the 21st international conference on World Wide Web
Cross-lingual knowledge linking across wiki knowledge bases

Proceedings of the 21st international conference on World Wide Web
DIADEM: domain-centric, intelligent, automated data extraction methodology

Proceedings of the 21st international conference companion on World Wide Web
Probase: a probabilistic taxonomy for text understanding

SIGMOD '12 Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data
Automatic web-scale information extraction

SIGMOD '12 Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data
Joint Entity Resolution

ICDE '12 Proceedings of the 2012 IEEE 28th International Conference on Data Engineering
CrowdER: crowdsourcing entity resolution

Proceedings of the VLDB Endowment
Temporally anchored relation extraction

ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers - Volume 1
Unsupervised relation discovery with sense disambiguation

ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers - Volume 1
Coupling label propagation and constraints for temporal fact extraction

ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Short Papers - Volume 2
Open language learning for information extraction

EMNLP-CoNLL '12 Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning
No noun phrase left behind: detecting and typing unlinkable entities

EMNLP-CoNLL '12 Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning
PATTY: a taxonomy of relational patterns with semantic types

EMNLP-CoNLL '12 Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning
KORE: keyphrase overlap relatedness for entity disambiguation

Proceedings of the 21st ACM international conference on Information and knowledge management
Acquiring temporal constraints between relations

Proceedings of the 21st ACM international conference on Information and knowledge management
Collaboratively built semi-structured content and Artificial Intelligence: The story so far

Artificial Intelligence
YAGO2: A spatially and temporally enhanced knowledge base from Wikipedia

Artificial Intelligence
Large-Scale learning of relation-extraction rules with distant supervision from the web

ISWC'12 Proceedings of the 11th international conference on The Semantic Web - Volume Part I
HIGGINS: knowledge acquisition meets the crowds

Proceedings of the 22nd international conference on World Wide Web companion
Inside YAGO2s: a transparent information extraction architecture

Proceedings of the 22nd international conference on World Wide Web companion
A framework for benchmarking entity-annotation systems

Proceedings of the 22nd international conference on World Wide Web
ClausIE: clause-based open information extraction

Proceedings of the 22nd international conference on World Wide Web
AMIE: association rule mining under incomplete evidence in ontological knowledge bases

Proceedings of the 22nd international conference on World Wide Web

Quantified Score

Hi-index	0.00

Visualization

Abstract

The proliferation of knowledge-sharing communities such as Wikipedia and the progress in scalable information extraction from Web and text sources have enabled the automatic construction of very large knowledge bases. Endeavors of this kind include projects such as DBpedia, Freebase, KnowItAll, ReadTheWeb, and YAGO. These projects provide automatically constructed knowledge bases of facts about named entities, their semantic classes, and their mutual relationships. They contain millions of entities and hundreds of millions of facts about them. Such world knowledge in turn enables cognitive applications and knowledge-centric services like disambiguating natural-language text, semantic search for entities and relations in Web and enterprise data, and entity-oriented analytics over unstructured contents. Prominent examples of how knowledge bases can be harnessed include the Google Knowledge Graph and the IBM Watson question answering system. This tutorial presents state-of-the-art methods, recent advances, research opportunities, and open challenges along this avenue of knowledge harvesting and its applications. Particular emphasis will be on the twofold role of knowledge bases for big-data analytics: using scalable distributed algorithms for harvesting knowledge from Web and text sources, and leveraging entity-centric knowledge for deeper interpretation of and better intelligence with Big Data.