SOFIE: a self-organizing framework for information extraction

Authors:
Fabian M. Suchanek;Mauro Sozio;Gerhard Weikum
Affiliations:
Max-Planck Institute for Informatics, Saarbruecken, Germany;Max-Planck Institute for Informatics, Saarbruecken, Germany;Max-Planck Institute for Informatics, Saarbruecken, Germany
Venue:
Proceedings of the 18th international conference on World wide web
Year:
2009

Citing 39
Cited 38

On the complexity of the maximum satisfiability problem for Horn formulas

Information Processing Letters
On the approximation of maximum satisfiability

SODA '92 Proceedings of the third annual ACM-SIAM symposium on Discrete algorithms
A simplified NP-complete MAXSAT problem

Information Processing Letters
Foundations of statistical natural language processing

Foundations of statistical natural language processing
Tight bound on Johnson's algorithm for maximum satisfiability

Journal of Computer and System Sciences
Snowball: extracting relations from large plain-text collections

DL '00 Proceedings of the fifth ACM conference on Digital libraries
New upper bounds for maximum satisfiability

Journal of Algorithms
Gadgets, Approximation, and Linear Programming

SIAM Journal on Computing
A machine program for theorem-proving

Communications of the ACM
Towards a standard upper ontology

Proceedings of the international conference on Formal Ontology in Information Systems - Volume 2001
Building Large Knowledge-Based Systems; Representation and Inference in the Cyc Project

Building Large Knowledge-Based Systems; Representation and Inference in the Cyc Project
Computers and Intractability: A Guide to the Theory of NP-Completeness

Computers and Intractability: A Guide to the Theory of NP-Completeness
Extracting Patterns and Relations from the World Wide Web

WebDB '98 Selected papers from the International Workshop on The World Wide Web and Databases
Web-scale information extraction in knowitall: (preliminary results)

Proceedings of the 13th international conference on World Wide Web
One sense per discourse

HLT '91 Proceedings of the workshop on Speech and Natural Language
MaxSolver: an efficient exact algorithm for (weighted) maximum satisfiability

Artificial Intelligence
Markov logic networks

Machine Learning
Combining linguistic and statistical analysis to extract relations from web documents

Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Word Sense Disambiguation: Algorithms and Applications (Text, Speech and Language Technology)

Word Sense Disambiguation: Algorithms and Applications (Text, Speech and Language Technology)
Yago: a core of semantic knowledge

Proceedings of the 16th international conference on World Wide Web
Leveraging data and structure in ontology integration

Proceedings of the 2007 ACM SIGMOD international conference on Management of data
Introduction to Statistical Relational Learning (Adaptive Computation and Machine Learning)

Introduction to Statistical Relational Learning (Adaptive Computation and Machine Learning)
Strategies for lifelong knowledge extraction from the web

Proceedings of the 4th international conference on Knowledge capture
Autonomously semantifying wikipedia

Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
Building structured web community portals: a top-down, compositional, and incremental approach

VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Declarative information extraction using datalog with embedded extraction predicates

VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Automatically refining the wikipedia infobox ontology

Proceedings of the 17th international conference on World Wide Web
YAGO: A Large Ontology from Wikipedia and WordNet

Web Semantics: Science, Services and Agents on the World Wide Web
Integrating YAGO into the Suggested Upper Merged Ontology

ICTAI '08 Proceedings of the 2008 20th IEEE International Conference on Tools with Artificial Intelligence - Volume 01
Information Extraction

Foundations and Trends in Databases
An Algebraic Approach to Rule-Based Information Extraction

ICDE '08 Proceedings of the 2008 IEEE 24th International Conference on Data Engineering
Machine reading

AAAI'06 proceedings of the 21st national conference on Artificial intelligence - Volume 2
Joint inference in information extraction

AAAI'07 Proceedings of the 22nd national conference on Artificial intelligence - Volume 1
Harvesting relations from the web: quantifiying the impact of filtering functions

AAAI'07 Proceedings of the 22nd national conference on Artificial intelligence - Volume 2
Open information extraction from the web

IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
Handbook on Ontologies

Handbook on Ontologies
PORE: positive-only relation extraction from wikipedia text

ISWC'07/ASWC'07 Proceedings of the 6th international The semantic web and 2nd Asian conference on Asian semantic web conference
DBpedia: a nucleus for a web of open data

ISWC'07/ASWC'07 Proceedings of the 6th international The semantic web and 2nd Asian conference on Asian semantic web conference
Extracting instances of relations from web documents using redundancy

ESWC'06 Proceedings of the 3rd European conference on The Semantic Web: research and applications

Entity Resolution in Texts Using Statistical Learning and Ontologies

ASWC '09 Proceedings of the 4th Asian Conference on The Semantic Web
From information to knowledge: harvesting entities and relationships from web sources

Proceedings of the twenty-ninth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
BioSnowball: automated population of Wikis

Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining
Learning 5000 relational extractors

ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
Unsupervised ontology induction from text

ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
Find your advisor: robust knowledge gathering from the web

Procceedings of the 13th International Workshop on the Web and Databases
Bayesian knowledge corroboration with logical rules and user feedback

ECML PKDD'10 Proceedings of the 2010 European conference on Machine learning and knowledge discovery in databases: Part II
Unsupervised ontology acquisition from plain texts: the OntoGain system

NLDB'10 Proceedings of the Natural language processing and information systems, and 15th international conference on Applications of natural language to information systems
ROXXI: Reviving witness dOcuments to eXplore eXtracted Information

Proceedings of the VLDB Endowment
Scalable knowledge harvesting with high precision and high recall

Proceedings of the fourth ACM international conference on Web search and data mining
Exploiting relation extraction for ontology alignment

ISWC'10 Proceedings of the 9th international semantic web conference on The semantic web - Volume Part II
Information extraction from Wikipedia using pattern learning

Acta Cybernetica
Database researchers: plumbers or thinkers?

Proceedings of the 14th International Conference on Extending Database Technology
DIDO: a disease-determinants ontology from web sources

Proceedings of the 20th international conference companion on World wide web
Query relaxation for entity-relationship search

ESWC'11 Proceedings of the 8th extended semantic web conference on The semanic web: research and applications - Volume Part II
Database foundations for scalable RDF processing

RW'11 Proceedings of the 7th international conference on Reasoning web: semantic technologies for the web of data
S3K: seeking statement-supporting top-K witnesses
Keyword search over RDF graphs

Proceedings of the 20th ACM international conference on Information and knowledge management
Robust disambiguation of named entities in text

EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Chapter 3: search for knowledge

Search Computing
Malleability-Aware skyline computation on linked open data

DASFAA'12 Proceedings of the 17th international conference on Database Systems for Advanced Applications - Volume Part II
Open language learning for information extraction

EMNLP-CoNLL '12 Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning
PATTY: a taxonomy of relational patterns with semantic types

EMNLP-CoNLL '12 Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning
LUKe and MIKe: learning from user knowledge and managing interactive knowledge extraction

Proceedings of the 21st ACM international conference on Information and knowledge management
YAGO2: A spatially and temporally enhanced knowledge base from Wikipedia

Artificial Intelligence
An evidence-based verification approach to extract entities and relations for knowledge base population

ISWC'12 Proceedings of the 11th international conference on The Semantic Web - Volume Part I
Controlled knowledge base enrichment from web documents

WISE'12 Proceedings of the 13th international conference on Web Information Systems Engineering
A framework for populating ontological models from semi-structured web documents

ER'12 Proceedings of the 31st international conference on Conceptual Modeling
Wikipedia entity expansion and attribute extraction from the web using semi-supervised learning

Proceedings of the sixth ACM international conference on Web search and data mining
Knowledge harvesting in the big-data era

Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data
Inside YAGO2s: a transparent information extraction architecture

Proceedings of the 22nd international conference on World Wide Web companion
ClausIE: clause-based open information extraction

Proceedings of the 22nd international conference on World Wide Web
N2R-part: identity link discovery using partially aligned ontologies

Proceedings of the 2nd International Workshop on Open Data
INDREX: in-database distributional relation extraction

Proceedings of the sixteenth international workshop on Data warehousing and OLAP
Aggregated search: A new information retrieval paradigm

ACM Computing Surveys (CSUR)
Entity extraction, linking, classification, and tagging for social media: a wikipedia-based approach

Proceedings of the VLDB Endowment
Aggregating semantic annotators

Proceedings of the VLDB Endowment
Guided curation of semistructured data in collaboratively-built knowledge bases

Future Generation Computer Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper presents SOFIE, a system for automated ontology extension. SOFIE can parse natural language documents, extract ontological facts from them and link the facts into an ontology. SOFIE uses logical reasoning on the existing knowledge and on the new knowledge in order to disambiguate words to their most probable meaning, to reason on the meaning of text patterns and to take into account world knowledge axioms. This allows SOFIE to check the plausibility of hypotheses and to avoid inconsistencies with the ontology. The framework of SOFIE unites the paradigms of pattern matching, word sense disambiguation and ontological reasoning in one unified model. Our experiments show that SOFIE delivers high-quality output, even from unstructured Internet documents.