Ontology-driven information extraction with ontosyphon

Authors:
Luke K. McDowell;Michael Cafarella
Affiliations:
Computer Science Department, U.S. Naval Academy, Annapolis, MD;Dept. of Computer Science and Engineering, University of Washington, Seattle, WA
Venue:
ISWC'06 Proceedings of the 5th international conference on The Semantic Web
Year:
2006

Citing 23
Cited 17

Towards text knowledge engineering

AAAI '98/IAAI '98 Proceedings of the fifteenth national/tenth conference on Artificial intelligence/Innovative applications of artificial intelligence
Snowball: extracting relations from large plain-text collections

DL '00 Proceedings of the fifth ACM conference on Digital libraries
Learning to construct knowledge bases from the World Wide Web

Artificial Intelligence - Special issue on Intelligent internet systems
S-CREAM - Semi-automatic CREAtion of Metadata

EKAW '02 Proceedings of the 13th International Conference on Knowledge Engineering and Knowledge Management. Ontologies and the Semantic Web
SemTag and seeker: bootstrapping the semantic web via automated semantic annotation

WWW '03 Proceedings of the 12th international conference on World Wide Web
Semantic search

WWW '03 Proceedings of the 12th international conference on World Wide Web
MindNet: acquiring and structuring semantic information from text

COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 2
Automatic acquisition of hyponyms from large text corpora

COLING '92 Proceedings of the 14th conference on Computational linguistics - Volume 2
Armadillo: harvesting information for the semantic web

Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Gimme' the context: context-driven automatic semantic annotation with C-PANKOW

WWW '05 Proceedings of the 14th international conference on World Wide Web
A search engine for natural language applications

WWW '05 Proceedings of the 14th international conference on World Wide Web
Using LSA and noun coordination information to improve the precision and recall of automatic hyponymy extraction

CONLL '03 Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003 - Volume 4
OntoMiner: Bootstrapping and Populating Ontologies from Domain-Specific Web Sites

IEEE Intelligent Systems
Towards terascale knowledge acquisition

COLING '04 Proceedings of the 20th international conference on Computational Linguistics
Extracting product features and opinions from reviews

HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
KnowItNow: fast, scalable information extraction from the web

HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
Searching for common sense: populating Cyc™ from the web

AAAI'05 Proceedings of the 20th national conference on Artificial intelligence - Volume 3
Learning concept hierarchies from text corpora using formal concept analysis

Journal of Artificial Intelligence Research
A probabilistic model of redundancy in information extraction

IJCAI'05 Proceedings of the 19th international joint conference on Artificial intelligence
Unsupervised named-entity extraction from the Web: An experimental study

Artificial Intelligence
Semantic annotation, indexing, and retrieval

Web Semantics: Science, Services and Agents on the World Wide Web
A method to combine linguistic ontology-mapping techniques

ISWC'05 Proceedings of the 4th international conference on The Semantic Web
Text2Onto: a framework for ontology learning and data-driven change discovery

NLDB'05 Proceedings of the 10th international conference on Natural Language Processing and Information Systems

Hierarchical, perceptron-like learning for ontology-based information extraction

Proceedings of the 16th international conference on World Wide Web
Information extraction from Wikipedia: moving down the long tail

Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Ontology-driven, unsupervised instance population

Web Semantics: Science, Services and Agents on the World Wide Web
Enriching Ontology for Deep Web Search

DEXA '08 Proceedings of the 19th international conference on Database and Expert Systems Applications
Reusing Collaborative Knowledge as Learning Objects ---The Implementation and Evaluation of AnnForum

EC-TEL '08 Proceedings of the 3rd European conference on Technology Enhanced Learning: Times of Convergence: Technologies Across Learning Contexts
Towards a System for Ontology-Based Information Extraction from PDF Documents

OTM '08 Proceedings of the OTM 2008 Confederated International Conferences, CoopIS, DOA, GADA, IS, and ODBASE 2008. Part II on On the Move to Meaningful Internet Systems
An Integrated Architecture for Processing Business Documents in Turkish

CICLing '09 Proceedings of the 10th International Conference on Computational Linguistics and Intelligent Text Processing
NLP Techniques for Term Extraction and Ontology Population

Proceedings of the 2008 conference on Ontology Learning and Population: Bridging the Gap between Text and Knowledge
Coupling semi-supervised learning of categories and relations

SemiSupLearn '09 Proceedings of the NAACL HLT 2009 Workshop on Semi-Supervised Learning for Natural Language Processing
Towards a wrapper-driven ontology-based framework for knowledge extraction

KSEM'07 Proceedings of the 2nd international conference on Knowledge science, engineering and management
Creating a dead poets society: extracting a social network of historical persons from the web

ISWC'07/ASWC'07 Proceedings of the 6th international The semantic web and 2nd Asian conference on Asian semantic web conference
ontoX - a method for ontology-driven information extraction

ICCSA'07 Proceedings of the 2007 international conference on Computational science and its applications - Volume Part III
Ontology-based information extraction: An introduction and a survey of current approaches

Journal of Information Science
Message classification as a basis for studying command and control communications--an evaluation of machine learning approaches

Journal of Intelligent Information Systems
Collective information extraction using first-order probabilistic models

Proceedings of the Fifth Balkan Conference in Informatics
Ontology-based information extraction of regulatory networks from scientific articles with case studies for Escherichia coli

Expert Systems with Applications: An International Journal
Bricking Semantic Wikipedia by relation population and predicate suggestion

Web Intelligence and Agent Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

The Semantic Web’s need for machine understandable content has led researchers to attempt to automatically acquire such content from a number of sources, including the web. To date, such research has focused on “document-driven” systems that individually process a small set of documents, annotating each with respect to a given ontology. This paper introduces OntoSyphon, an alternative that strives to more fully leverage existing ontological content while scaling to extract comparatively shallow content from millions of documents. OntoSyphon operates in an “ontology-driven” manner: taking any ontology as input, OntoSyphon uses the ontology to specify web searches that identify possible semantic instances, relations, and taxonomic information. Redundancy in the web, together with information from the ontology, is then used to automatically verify these candidate instances and relations, enabling OntoSyphon to operate in a fully automated, unsupervised manner. A prototype of OntoSyphon is fully implemented and we present experimental results that demonstrate substantial instance learning in a variety of domains based on independently constructed ontologies. We also introduce new methods for improving instance verification, and demonstrate that they improve upon previously known techniques.