Populating the Semantic Web by Macro-reading Internet Text

Authors:
Tom M. Mitchell;Justin Betteridge;Andrew Carlson;Estevam Hruschka;Richard Wang
Affiliations:
Carnegie Mellon University, Pittsburgh, USA 15213;Carnegie Mellon University, Pittsburgh, USA 15213;Carnegie Mellon University, Pittsburgh, USA 15213;Carnegie Mellon University, Pittsburgh, USA 15213 and Federal University of Sao Carlos, Brazil;Carnegie Mellon University, Pittsburgh, USA 15213
Venue:
ISWC '09 Proceedings of the 8th International Semantic Web Conference
Year:
2009

Citing 8
Cited 16

Combining labeled and unlabeled data with co-training

COLT' 98 Proceedings of the eleventh annual conference on Computational learning theory
Learning dictionaries for information extraction by multi-level bootstrapping

AAAI '99/IAAI '99 Proceedings of the sixteenth national conference on Artificial intelligence and the eleventh Innovative applications of artificial intelligence conference innovative applications of artificial intelligence
Text Classification from Labeled and Unlabeled Documents using EM

Machine Learning - Special issue on information retrieval
Extracting Patterns and Relations from the World Wide Web

WebDB '98 Selected papers from the International Workshop on The World Wide Web and Databases
Unsupervised word sense disambiguation rivaling supervised methods

ACL '95 Proceedings of the 33rd annual meeting on Association for Computational Linguistics
Introduction to the CoNLL-2003 shared task: language-independent named entity recognition

CONLL '03 Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003 - Volume 4
Language-Independent Set Expansion of Named Entities Using the Web

ICDM '07 Proceedings of the 2007 Seventh IEEE International Conference on Data Mining
Coupling semi-supervised learning of categories and relations

SemiSupLearn '09 Proceedings of the NAACL HLT 2009 Workshop on Semi-Supervised Learning for Natural Language Processing

Empirical studies in learning to read

FAM-LbR '10 Proceedings of the NAACL HLT 2010 First International Workshop on Formalisms and Methodology for Learning by Reading
Filling knowledge gaps in text for machine reading

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics: Posters
Coreference for learning to extract relations: yes, Virginia, coreference matters

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: short papers - Volume 2
Ontology extraction and integration from semi-structured data

AMT'11 Proceedings of the 7th international conference on Active media technology
Connecting Two (or Less) Dots: Discovering Structure in News Articles

ACM Transactions on Knowledge Discovery from Data (TKDD)
Extreme extraction: machine reading in a week

EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Collective intelligence as a source for machine learning self-supervision

Proceedings of the 4th International Workshop on Web Intelligence & Communities
Green-Thumb camera: LOD application for field IT

ESWC'12 Proceedings of the 9th international conference on The Semantic Web: research and applications
A field application of LOD: LOD extraction from web and LOD search by sensor

Proceedings of the 8th International Conference on Semantic Systems
User-driven relational models for entity-relation search and extraction

Proceedings of the 1st Joint International Workshop on Entity-Oriented and Semantic Search
Toward an ecosystem of LOD in the field: LOD content generation and its consuming service

ISWC'12 Proceedings of the 11th international conference on The Semantic Web - Volume Part II
A Cognitive Framework for Core Language Understanding and its Computational Implementation

International Journal of Cognitive Informatics and Natural Intelligence
Knowledge base population and visualization using an ontology based on semantic roles

Proceedings of the 2013 workshop on Automated knowledge base construction
Statistical relational data integration for information extraction

RW'13 Proceedings of the 9th international conference on Reasoning Web: semantic technologies for intelligent data access
Tailoring the automated construction of large-scale taxonomies using the web

Language Resources and Evaluation
Coupling as Strategy for Reducing Concept-Drift in Never-ending Learning Environments

Fundamenta Informaticae - Cognitive Informatics and Computational Intelligence: Theory and Applications

Quantified Score

Hi-index	0.00

Visualization

Abstract

A key question regarding the future of the semantic web is "how will we acquire structured information to populate the semantic web on a vast scale?" One approach is to enter this information manually. A second approach is to take advantage of pre-existing databases, and to develop common ontologies, publishing standards, and reward systems to make this data widely accessible. We consider here a third approach: developing software that automatically extracts structured information from unstructured text present on the web. We also describe preliminary results demonstrating that machine learning algorithms can learn to extract tens of thousands of facts to populate a diverse ontology, with imperfect but reasonably good accuracy.