Ontology-driven, unsupervised instance population

Authors:
Luke K. McDowell;Michael Cafarella
Affiliations:
Computer Science Department, U.S. Naval Academy, 572M Holloway Road Stop 9F, Annapolis, MD 21402, USA;Department of Computer Science & Engineering, University of Washington, Seattle, WA 98195, USA
Venue:
Web Semantics: Science, Services and Agents on the World Wide Web
Year:
2008

Citing 42
Cited 5

Towards text knowledge engineering

AAAI '98/IAAI '98 Proceedings of the fifteenth national/tenth conference on Artificial intelligence/Innovative applications of artificial intelligence
Snowball: extracting relations from large plain-text collections

DL '00 Proceedings of the fifth ACM conference on Digital libraries
Hierarchical classification of Web content

SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
Learning to construct knowledge bases from the World Wide Web

Artificial Intelligence - Special issue on Intelligent internet systems
S-CREAM - Semi-automatic CREAtion of Metadata

EKAW '02 Proceedings of the 13th International Conference on Knowledge Engineering and Knowledge Management. Ontologies and the Semantic Web
Hierarchically Classifying Documents Using Very Few Words

ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
Four Steps Towards the Widespread Adoption of a Semantic Web

ISWC '02 Proceedings of the First International Semantic Web Conference on The Semantic Web
SemTag and seeker: bootstrapping the semantic web via automated semantic annotation

WWW '03 Proceedings of the 12th international conference on World Wide Web
Semantic search

WWW '03 Proceedings of the 12th international conference on World Wide Web
MindNet: acquiring and structuring semantic information from text

COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 2
Towards the self-annotating web

Proceedings of the 13th international conference on World Wide Web
Automatic acquisition of hyponyms from large text corpora

COLING '92 Proceedings of the 14th conference on Computational linguistics - Volume 2
Armadillo: harvesting information for the semantic web

Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Acquisition of categorized named entities for web search

Proceedings of the thirteenth ACM international conference on Information and knowledge management
Gimme' the context: context-driven automatic semantic annotation with C-PANKOW

WWW '05 Proceedings of the 14th international conference on World Wide Web
A search engine for natural language applications

WWW '05 Proceedings of the 14th international conference on World Wide Web
Unsupervised methods for developing taxonomies by combining syntactic and statistical information

NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1
Unsupervised named-entity extraction from the web: an experimental study

Artificial Intelligence
Using LSA and noun coordination information to improve the precision and recall of automatic hyponymy extraction

CONLL '03 Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003 - Volume 4
OntoMiner: Bootstrapping and Populating Ontologies from Domain-Specific Web Sites

IEEE Intelligent Systems
Espresso: leveraging generic patterns for automatically harvesting semantic relations

ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Towards terascale knowledge acquisition

COLING '04 Proceedings of the 20th international conference on Computational Linguistics
Extracting product features and opinions from reviews

HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
KnowItNow: fast, scalable information extraction from the web

HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
Towards domain-independent information extraction from web tables

Proceedings of the 16th international conference on World Wide Web
Yago: a core of semantic knowledge

Proceedings of the 16th international conference on World Wide Web
Hierarchical, perceptron-like learning for ontology-based information extraction

Proceedings of the 16th international conference on World Wide Web
Organizing and searching the world wide web of facts - step one: the one-million fact extraction challenge

AAAI'06 proceedings of the 21st national conference on Artificial intelligence - Volume 2
Searching for common sense: populating Cyc™ from the web

AAAI'05 Proceedings of the 20th national conference on Artificial intelligence - Volume 3
Harvesting relations from the web: quantifiying the impact of filtering functions

AAAI'07 Proceedings of the 22nd national conference on Artificial intelligence - Volume 2
Deriving a large scale taxonomy from Wikipedia

AAAI'07 Proceedings of the 22nd national conference on Artificial intelligence - Volume 2
Learning concept hierarchies from text corpora using formal concept analysis

Journal of Artificial Intelligence Research
Open information extraction from the web

IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
Automatically learning qualia structures from the web

DeepLA '05 Proceedings of the ACL-SIGLEX Workshop on Deep Lexical Acquisition
A probabilistic model of redundancy in information extraction

IJCAI'05 Proceedings of the 19th international joint conference on Artificial intelligence
Semantic annotation, indexing, and retrieval

Web Semantics: Science, Services and Agents on the World Wide Web
A method to combine linguistic ontology-mapping techniques

ISWC'05 Proceedings of the 4th international conference on The Semantic Web
Automatic extraction of hierarchical relations from text

ESWC'06 Proceedings of the 3rd European conference on The Semantic Web: research and applications
Ontology-driven information extraction with ontosyphon

ISWC'06 Proceedings of the 5th international conference on The Semantic Web
A framework for schema-driven relationship discovery from unstructured text

ISWC'06 Proceedings of the 5th international conference on The Semantic Web
Automatic extraction of semantic relationships for wordnet by means of pattern learning from wikipedia

NLDB'05 Proceedings of the 10th international conference on Natural Language Processing and Information Systems
Text2Onto: a framework for ontology learning and data-driven change discovery

NLDB'05 Proceedings of the 10th international conference on Natural Language Processing and Information Systems

Ontology-Based hazard information extraction from chinese food complaint documents

ICSI'12 Proceedings of the Third international conference on Advances in Swarm Intelligence - Volume Part II
BioOntoVerb: A top level ontology based framework to populate biomedical ontologies from texts

Knowledge-Based Systems
Semantics Discovery via Human Computation Games

International Journal on Semantic Web & Information Systems
Introducing inference-driven OWL ABox enrichment

Proceedings of International Conference on Information Integration and Web-based Applications & Services
FLOPPIES: A Framework for Large-Scale Ontology Population of Product Information from Tabular Data in E-commerce Stores

Decision Support Systems

Quantified Score

Hi-index	0.01

Visualization

Abstract

The Semantic Web's need for machine understandable content has led researchers to attempt to automatically acquire such content from a number of sources, including the web. To date, such research has focused on ''document-driven'' systems that individually process a small set of documents, annotating each with respect to a given ontology. This article introduces OntoSyphon, an alternative that strives to more fully leverage existing ontological content while scaling to extract comparatively shallow content from millions of documents. OntoSyphon operates in an ''ontology-driven'' manner: taking any ontology as input, OntoSyphon uses the ontology to specify web searches that identify possible semantic instances, relations, and taxonomic information. Redundancy in the web, together with information from the ontology, is then used to automatically verify these candidate instances and relations, enabling OntoSyphon to operate in a fully automated, unsupervised manner. A prototype of OntoSyphon is fully implemented and we present experimental results that demonstrate substantial instance population in three domains based on independently constructed ontologies. We show that using the whole web as a corpus for verification yields the best results, but that using a much smaller web corpus can also yield strong performance. In addition, we consider the problem of selecting the best class for each candidate instance that is discovered, and the problem of ranking the final results. For both problems we introduce new solutions and demonstrate that, for both the small and large corpora, they consistently improve upon previously known techniques.