From information to knowledge: harvesting entities and relationships from web sources

Authors:
Gerhard Weikum;Martin Theobald
Affiliations:
Max Planck Institute for Informatics, Saarbruecken, Germany;Max Planck Institute for Informatics, Saarbruecken, Germany
Venue:
Proceedings of the twenty-ninth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Year:
2010

Citing 111
Cited 14

CYC: a large-scale investment in knowledge infrastructure

Communications of the ACM
Foundations of statistical natural language processing

Foundations of statistical natural language processing
Relational learning of pattern-match rules for information extraction

AAAI '99/IAAI '99 Proceedings of the sixteenth national conference on Artificial intelligence and the eleventh Innovative applications of artificial intelligence conference innovative applications of artificial intelligence
Wrapper induction: efficiency and expressiveness

Artificial Intelligence - Special issue on Intelligent internet systems
Building intelligent web applications using lightweight wrappers

Data & Knowledge Engineering - Special issue on heterogeneous information resources need semantic access
Snowball: a prototype system for extracting relations from large text collections

SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Building Large Knowledge-Based Systems; Representation and Inference in the Cyc Project

Building Large Knowledge-Based Systems; Representation and Inference in the Cyc Project
Ontology Learning for the Semantic Web

IEEE Intelligent Systems
Automatic labeling of semantic roles

Computational Linguistics
Measuring Similarity between Ontologies

EKAW '02 Proceedings of the 13th International Conference on Knowledge Engineering and Knowledge Management. Ontologies and the Semantic Web
Visual Web Information Extraction with Lixto

Proceedings of the 27th International Conference on Very Large Data Bases
RoadRunner: Towards Automatic Data Extraction from Large Web Sites

Proceedings of the 27th International Conference on Very Large Data Bases
Fully Automatic Acquisition of Taxonomic Knowledge from Large Corpora of Texts: Limited Syntax Knowledge Representation System Based on Natural Language

ISMIS '99 Proceedings of the 11th International Symposium on Foundations of Intelligent Systems
Extracting Patterns and Relations from the World Wide Web

WebDB '98 Selected papers from the International Workshop on The World Wide Web and Databases
Semi-Automatic Wrapper Generation for Internet Information Sources

COOPIS '97 Proceedings of the Second IFCIS International Conference on Cooperative Information Systems
Text Mining for Causal Relations

Proceedings of the Fifteenth International Florida Artificial Intelligence Research Society Conference
Information Extraction with HMM Structures Learned by Stochastic Optimization

Proceedings of the Seventeenth National Conference on Artificial Intelligence and Twelfth Conference on Innovative Applications of Artificial Intelligence
Extracting structured data from Web pages

Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Noun classification from predicate-argument structures

ACL '90 Proceedings of the 28th annual meeting on Association for Computational Linguistics
Web-scale information extraction in knowitall: (preliminary results)

Proceedings of the 13th international conference on World Wide Web
Automatic acquisition of hyponyms from large text corpora

COLING '92 Proceedings of the 14th conference on Computational linguistics - Volume 2
Automatic information extraction from large websites

Journal of the ACM (JACM)
Finding parts in very large corpora

ACL '99 Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics
Handbook of Temporal Reasoning in Artificial Intelligence (Foundations of Artificial Intelligence (Elsevier))

Handbook of Temporal Reasoning in Artificial Intelligence (Foundations of Artificial Intelligence (Elsevier))
Robust Identification of Fuzzy Duplicates

ICDE '05 Proceedings of the 21st International Conference on Data Engineering
The Lixto data extraction project: back and forth between theory and practice

PODS '04 Proceedings of the twenty-third ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Data exchange: semantics and query answering

Theoretical Computer Science - Database theory
Unsupervised named-entity extraction from the web: an experimental study

Artificial Intelligence
Markov logic networks

Machine Learning
Principles of dataspace systems

Proceedings of the twenty-fifth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Record linkage: similarity measures and algorithms

Proceedings of the 2006 ACM SIGMOD international conference on Management of data
Combining linguistic and statistical analysis to extract relations from web documents

Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Hierarchical Language Models for Expert Finding in Enterprise Corpora

ICTAI '06 Proceedings of the 18th IEEE International Conference on Tools with Artificial Intelligence
Entity Resolution with Markov Logic

ICDM '06 Proceedings of the Sixth International Conference on Data Mining
Speech and Language Processing (2nd Edition)

Speech and Language Processing (2nd Edition)
Semantic retrieval for the accurate identification of relational concepts in massive textbases

ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Automating temporal annotation with TARSQI

ACLdemo '05 Proceedings of the ACL 2005 on Interactive poster and demonstration sessions
Web object retrieval

Proceedings of the 16th international conference on World Wide Web
Dynamic personalized pagerank in entity-relation graphs

Proceedings of the 16th international conference on World Wide Web
Yago: a core of semantic knowledge

Proceedings of the 16th international conference on World Wide Web
SPARQ2L: towards support for subgraph extraction queries in rdf databases

Proceedings of the 16th international conference on World Wide Web
Leveraging data and structure in ontology integration

Proceedings of the 2007 ACM SIGMOD international conference on Management of data
Towards a query optimizer for text-centric tasks

ACM Transactions on Database Systems (TODS)
Introduction to Statistical Relational Learning (Adaptive Computation and Machine Learning)

Introduction to Statistical Relational Learning (Adaptive Computation and Machine Learning)
Autonomously semantifying wikipedia

Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
EntityRank: searching entities directly and holistically

VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Building structured web community portals: a top-down, compositional, and incremental approach

VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Data integration with uncertainty

VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Declarative information extraction using datalog with embedded extraction predicates

VLDB '07 Proceedings of the 33rd international conference on Very large data bases
A relational approach to incrementally extracting and querying structure in unstructured data

VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Authority-based keyword search in databases

ACM Transactions on Database Systems (TODS)
Databases with uncertainty and lineage

The VLDB Journal — The International Journal on Very Large Data Bases
Towards temporal web search

Proceedings of the 2008 ACM symposium on Applied computing
Scaling RDF with Time

Proceedings of the 17th international conference on World Wide Web
Automatically refining the wikipedia infobox ontology

Proceedings of the 17th international conference on World Wide Web
Information integration in the enterprise

Communications of the ACM - Enterprise information integration: and other tools for merging data
Inferring the most important types of a query: a semantic approach

Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Tree kernels for semantic role labeling

Computational Linguistics
YAGO: A Large Ontology from Wikipedia and WordNet

Web Semantics: Science, Services and Agents on the World Wide Web
Language-Independent Set Expansion of Named Entities Using the Web

ICDM '07 Proceedings of the 2007 Seventh IEEE International Conference on Data Mining
WebTables: exploring the power of tables on the web

Proceedings of the VLDB Endowment
RDF-3X: a RISC-style engine for RDF

Proceedings of the VLDB Endowment
A first tutorial on dataspaces

Proceedings of the VLDB Endowment
Modeling multi-step relevance propagation for expert finding

Proceedings of the 17th ACM conference on Information and knowledge management
NewsStand: a new view on news

Proceedings of the 16th ACM SIGSPATIAL international conference on Advances in geographic information systems
Harvesting, searching, and ranking knowledge on the web: invited talk

Proceedings of the Second ACM International Conference on Web Search and Data Mining
Information Extraction

Foundations and Trends in Databases
A quality-aware optimizer for information extraction

ACM Transactions on Database Systems (TODS)
High-performance information extraction with AliBaba

Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology
SystemT: a system for declarative information extraction

ACM SIGMOD Record
Information extraction challenges in managing unstructured data

ACM SIGMOD Record
Web-scale extraction of structured data

ACM SIGMOD Record
Using Wikipedia to bootstrap open information extraction

ACM SIGMOD Record
StatSnowball: a statistical approach to extracting entity relationships

Proceedings of the 18th international conference on World wide web
Exploiting web search to generate synonyms for entities

Proceedings of the 18th international conference on World wide web
SOFIE: a self-organizing framework for information extraction

Proceedings of the 18th international conference on World wide web
Probabilistic databases: diamonds in the dirt

Communications of the ACM - Barbara Liskov: ACM's A.M. Turing Award Winner
Extending SPARQL with regular expression patterns (for querying RDF)

Web Semantics: Science, Services and Agents on the World Wide Web
Swoosh: a generic approach to entity resolution

The VLDB Journal — The International Journal on Very Large Data Bases
An Algebraic Approach to Rule-Based Information Extraction

ICDE '08 Proceedings of the 2008 IEEE 24th International Conference on Data Engineering
Efficient Information Extraction over Evolving Text Data

ICDE '08 Proceedings of the 2008 IEEE 24th International Conference on Data Engineering
NAGA: Searching and Ranking Knowledge

ICDE '08 Proceedings of the 2008 IEEE 24th International Conference on Data Engineering
Join Optimization of Information Extraction Output: Quality Matters!

ICDE '09 Proceedings of the 2009 IEEE International Conference on Data Engineering
Uncertainty management in rule-based information extraction systems

Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
Optimizing complex extraction programs over evolving text data

Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
WikiTaxonomy: A Large Scale Knowledge Resource

Proceedings of the 2008 conference on ECAI 2008: 18th European Conference on Artificial Intelligence
Machine reading

AAAI'06 proceedings of the 21st national conference on Artificial intelligence - Volume 2
TextRunner: open information extraction on the web

NAACL-Demonstrations '07 Proceedings of Human Language Technologies: The Annual Conference of the North American Chapter of the Association for Computational Linguistics: Demonstrations
Deriving a large scale taxonomy from Wikipedia

AAAI'07 Proceedings of the 22nd national conference on Artificial intelligence - Volume 2
A general method for reducing the complexity of relational inference and its application to MCMC

AAAI'08 Proceedings of the 23rd national conference on Artificial intelligence - Volume 2
Learning and inference with constraints

AAAI'08 Proceedings of the 23rd national conference on Artificial intelligence - Volume 3
Open information extraction from the web

IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
Clustering and exploring search results using timeline constructions

Proceedings of the 18th ACM conference on Information and knowledge management
Language-model-based ranking for queries on RDF-graphs

Proceedings of the 18th ACM conference on Information and knowledge management
Handbook on Ontologies

Handbook on Ontologies
Large-scale taxonomy mapping for restructuring and integrating wikipedia

IJCAI'09 Proceedings of the 21st international jont conference on Artifical intelligence
Mining document collections to facilitate accurate approximate entity matching

Proceedings of the VLDB Endowment
Harvesting relational tables from lists on the web

Proceedings of the VLDB Endowment
Data integration for the relational web

Proceedings of the VLDB Endowment
Character-level analysis of semi-structured documents for set expansion

EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 3 - Volume 3
Coupled semi-supervised learning for information extraction

Proceedings of the third ACM international conference on Web search and data mining
Timely YAGO: harvesting, querying, and visualizing temporal knowledge from Wikipedia

Proceedings of the 13th International Conference on Extending Database Technology
Probabilistic models for expert finding

ECIR'07 Proceedings of the 29th European conference on IR research
Sig.ma: live views on the web of data

Proceedings of the 19th international conference on World wide web
DBpedia: a nucleus for a web of open data

ISWC'07/ASWC'07 Proceedings of the 6th international The semantic web and 2nd Asian conference on Asian semantic web conference
An Introduction to Duplicate Detection

An Introduction to Duplicate Detection
Markov Logic: An Interface Layer for Artificial Intelligence

Markov Logic: An Interface Layer for Artificial Intelligence
LIVE: a lineage-supported versioned DBMS

SSDBM'10 Proceedings of the 22nd international conference on Scientific and statistical database management
A language modeling approach for temporal information needs

ECIR'2010 Proceedings of the 32nd European conference on Advances in Information Retrieval
Text2Onto: a framework for ontology learning and data-driven change discovery

NLDB'05 Proceedings of the 10th international conference on Natural Language Processing and Information Systems
The lixto project: exploring new frontiers of web data extraction

BNCOD'06 Proceedings of the 23rd British National Conference on Databases, conference on Flexible and Efficient Information Handling

Scalable knowledge harvesting with high precision and high recall

Proceedings of the fourth ACM international conference on Web search and data mining
Database researchers: plumbers or thinkers?

Proceedings of the 14th International Conference on Extending Database Technology
DIDO: a disease-determinants ontology from web sources

Proceedings of the 20th international conference companion on World wide web
Guidance for domain specific modeling in small and medium enterprises

Proceedings of the compilation of the co-located workshops on DSM'11, TMC'11, AGERE!'11, AOOPES'11, NEAT'11, & VMIL'11
Supporting software language engineering by automated domain knowledge acquisition

MODELS'11 Proceedings of the 2011th international conference on Models in Software Engineering
Towards distributed MCMC inference in probabilistic knowledge bases

AKBC-WEKEX '12 Proceedings of the Joint Workshop on Automatic Knowledge Base Construction and Web-scale Knowledge Extraction
YAGO2: A spatially and temporally enhanced knowledge base from Wikipedia

Artificial Intelligence
An evidence-based verification approach to extract entities and relations for knowledge base population

ISWC'12 Proceedings of the 11th international conference on The Semantic Web - Volume Part I
Towards web-scale structured web data extraction

Proceedings of the sixth ACM international conference on Web search and data mining
Knowledge harvesting in the big-data era

Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data
Quality-driven extraction, fusion and matchmaking of semantic web API descriptions

Journal of Web Engineering
Learning relatedness measures for entity linking

Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
The digital universe - an information theoretical analyses

Proceedings of the 14th International Conference on Computer Systems and Technologies
Knowledge base completion via search-based question answering

Proceedings of the 23rd international conference on World wide web

Quantified Score

Hi-index	0.00

Visualization

Abstract

There are major trends to advance the functionality of search engines to a more expressive semantic level. This is enabled by the advent of knowledge-sharing communities such as Wikipedia and the progress in automatically extracting entities and relationships from semistructured as well as natural-language Web sources. Recent endeavors of this kind include DBpedia, EntityCube, KnowItAll, ReadTheWeb, and our own YAGO-NAGA project (and others). The goal is to automatically construct and maintain a comprehensive knowledge base of facts about named entities, their semantic classes, and their mutual relations as well as temporal contexts, with high precision and high recall. This tutorial discusses state-of-the-art methods, research opportunities, and open challenges along this avenue of knowledge harvesting.