Towards text knowledge engineering
AAAI '98/IAAI '98 Proceedings of the fifteenth national/tenth conference on Artificial intelligence/Innovative applications of artificial intelligence
Snowball: extracting relations from large plain-text collections
DL '00 Proceedings of the fifth ACM conference on Digital libraries
Hierarchical classification of Web content
SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
Learning to construct knowledge bases from the World Wide Web
Artificial Intelligence - Special issue on Intelligent internet systems
S-CREAM - Semi-automatic CREAtion of Metadata
EKAW '02 Proceedings of the 13th International Conference on Knowledge Engineering and Knowledge Management. Ontologies and the Semantic Web
Hierarchically Classifying Documents Using Very Few Words
ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
Four Steps Towards the Widespread Adoption of a Semantic Web
ISWC '02 Proceedings of the First International Semantic Web Conference on The Semantic Web
SemTag and seeker: bootstrapping the semantic web via automated semantic annotation
WWW '03 Proceedings of the 12th international conference on World Wide Web
WWW '03 Proceedings of the 12th international conference on World Wide Web
MindNet: acquiring and structuring semantic information from text
COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 2
Towards the self-annotating web
Proceedings of the 13th international conference on World Wide Web
Automatic acquisition of hyponyms from large text corpora
COLING '92 Proceedings of the 14th conference on Computational linguistics - Volume 2
Armadillo: harvesting information for the semantic web
Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Acquisition of categorized named entities for web search
Proceedings of the thirteenth ACM international conference on Information and knowledge management
Gimme' the context: context-driven automatic semantic annotation with C-PANKOW
WWW '05 Proceedings of the 14th international conference on World Wide Web
A search engine for natural language applications
WWW '05 Proceedings of the 14th international conference on World Wide Web
Unsupervised methods for developing taxonomies by combining syntactic and statistical information
NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1
Unsupervised named-entity extraction from the web: an experimental study
Artificial Intelligence
CONLL '03 Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003 - Volume 4
OntoMiner: Bootstrapping and Populating Ontologies from Domain-Specific Web Sites
IEEE Intelligent Systems
Espresso: leveraging generic patterns for automatically harvesting semantic relations
ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Towards terascale knowledge acquisition
COLING '04 Proceedings of the 20th international conference on Computational Linguistics
Extracting product features and opinions from reviews
HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
KnowItNow: fast, scalable information extraction from the web
HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
Towards domain-independent information extraction from web tables
Proceedings of the 16th international conference on World Wide Web
Yago: a core of semantic knowledge
Proceedings of the 16th international conference on World Wide Web
Hierarchical, perceptron-like learning for ontology-based information extraction
Proceedings of the 16th international conference on World Wide Web
AAAI'06 proceedings of the 21st national conference on Artificial intelligence - Volume 2
Searching for common sense: populating Cyc™ from the web
AAAI'05 Proceedings of the 20th national conference on Artificial intelligence - Volume 3
Harvesting relations from the web: quantifiying the impact of filtering functions
AAAI'07 Proceedings of the 22nd national conference on Artificial intelligence - Volume 2
Deriving a large scale taxonomy from Wikipedia
AAAI'07 Proceedings of the 22nd national conference on Artificial intelligence - Volume 2
Learning concept hierarchies from text corpora using formal concept analysis
Journal of Artificial Intelligence Research
Open information extraction from the web
IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
Automatically learning qualia structures from the web
DeepLA '05 Proceedings of the ACL-SIGLEX Workshop on Deep Lexical Acquisition
A probabilistic model of redundancy in information extraction
IJCAI'05 Proceedings of the 19th international joint conference on Artificial intelligence
Semantic annotation, indexing, and retrieval
Web Semantics: Science, Services and Agents on the World Wide Web
A method to combine linguistic ontology-mapping techniques
ISWC'05 Proceedings of the 4th international conference on The Semantic Web
Automatic extraction of hierarchical relations from text
ESWC'06 Proceedings of the 3rd European conference on The Semantic Web: research and applications
Ontology-driven information extraction with ontosyphon
ISWC'06 Proceedings of the 5th international conference on The Semantic Web
A framework for schema-driven relationship discovery from unstructured text
ISWC'06 Proceedings of the 5th international conference on The Semantic Web
NLDB'05 Proceedings of the 10th international conference on Natural Language Processing and Information Systems
Text2Onto: a framework for ontology learning and data-driven change discovery
NLDB'05 Proceedings of the 10th international conference on Natural Language Processing and Information Systems
Ontology-Based hazard information extraction from chinese food complaint documents
ICSI'12 Proceedings of the Third international conference on Advances in Swarm Intelligence - Volume Part II
BioOntoVerb: A top level ontology based framework to populate biomedical ontologies from texts
Knowledge-Based Systems
Semantics Discovery via Human Computation Games
International Journal on Semantic Web & Information Systems
Introducing inference-driven OWL ABox enrichment
Proceedings of International Conference on Information Integration and Web-based Applications & Services
Hi-index | 0.01 |
The Semantic Web's need for machine understandable content has led researchers to attempt to automatically acquire such content from a number of sources, including the web. To date, such research has focused on ''document-driven'' systems that individually process a small set of documents, annotating each with respect to a given ontology. This article introduces OntoSyphon, an alternative that strives to more fully leverage existing ontological content while scaling to extract comparatively shallow content from millions of documents. OntoSyphon operates in an ''ontology-driven'' manner: taking any ontology as input, OntoSyphon uses the ontology to specify web searches that identify possible semantic instances, relations, and taxonomic information. Redundancy in the web, together with information from the ontology, is then used to automatically verify these candidate instances and relations, enabling OntoSyphon to operate in a fully automated, unsupervised manner. A prototype of OntoSyphon is fully implemented and we present experimental results that demonstrate substantial instance population in three domains based on independently constructed ontologies. We show that using the whole web as a corpus for verification yields the best results, but that using a much smaller web corpus can also yield strong performance. In addition, we consider the problem of selecting the best class for each candidate instance that is discovered, and the problem of ranking the final results. For both problems we introduce new solutions and demonstrate that, for both the small and large corpora, they consistently improve upon previously known techniques.