Coupled semi-supervised learning for information extraction

Authors:
Andrew Carlson;Justin Betteridge;Richard C. Wang;Estevam R. Hruschka, Jr.;Tom M. Mitchell
Affiliations:
Carnegie Mellon University, Pittsburgh, PA, USA;Carnegie Mellon University, Pittsburgh, PA, USA;Carnegie Mellon University, Pittsburgh, PA, USA;Federal University of Sao Carlos, Sao Carlos, Brazil;Carnegie Mellon University, Pittsburgh, PA, USA
Venue:
Proceedings of the third ACM international conference on Web search and data mining
Year:
2010

Citing 22
Cited 57

Multitask Learning

Machine Learning - Special issue on inductive transfer
Combining labeled and unlabeled data with co-training

COLT' 98 Proceedings of the eleventh annual conference on Computational learning theory
An Algorithm that Learns What‘s in a Name

Machine Learning - Special issue on natural language learning
Learning dictionaries for information extraction by multi-level bootstrapping

AAAI '99/IAAI '99 Proceedings of the sixteenth national conference on Artificial intelligence and the eleventh Innovative applications of artificial intelligence conference innovative applications of artificial intelligence
Snowball: extracting relations from large plain-text collections

DL '00 Proceedings of the fifth ACM conference on Digital libraries
Extracting Patterns and Relations from the World Wide Web

WebDB '98 Selected papers from the International Workshop on The World Wide Web and Databases
Kernel methods for relation extraction

The Journal of Machine Learning Research
Automatic acquisition of hyponyms from large text corpora

COLING '92 Proceedings of the 14th conference on Computational linguistics - Volume 2
Counter-training in discovery of semantic patterns

ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1
Names and similarities on the web: fact extraction in the fast lane

ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Effective self-training for parsing

HLT-NAACL '06 Proceedings of the main conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics
MapReduce: simplified data processing on large clusters

Communications of the ACM - 50th anniversary issue: 1958 - 2008
Iterative Set Expansion of Named Entities Using the Web

ICDM '08 Proceedings of the 2008 Eighth IEEE International Conference on Data Mining
Web-scale extraction of structured data

ACM SIGMOD Record
Semisupervised Multitask Learning

IEEE Transactions on Pattern Analysis and Machine Intelligence
Cheap and fast---but is it good?: evaluating non-expert annotations for natural language tasks

EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Weakly-supervised acquisition of labeled class instances using graph random walks

EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Cross-task knowledge-constrained self training

EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Locating complex named entities in web text

IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
Entity extraction via ensemble semantics

EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 1 - Volume 1
Character-level analysis of semi-structured documents for set expansion

EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 3 - Volume 3
A PAC-Style model for learning from labeled and unlabeled data

COLT'05 Proceedings of the 18th annual conference on Learning Theory

From information to knowledge: harvesting entities and relationships from web sources

Proceedings of the twenty-ninth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
A latent dirichlet allocation method for selectional preferences

ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
Experiments in graph-based semi-supervised learning methods for class-instance acquisition

ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
Find your advisor: robust knowledge gathering from the web

Procceedings of the 13th International Workshop on the Web and Databases
Posterior Regularization for Structured Latent Variable Models

The Journal of Machine Learning Research
Computational creativity tools for songwriters

CALC '10 Proceedings of the NAACL HLT 2010 Second Workshop on Computational Approaches to Linguistic Creativity
Large scale relation detection

FAM-LbR '10 Proceedings of the NAACL HLT 2010 First International Workshop on Formalisms and Methodology for Learning by Reading
Towards learning rules from natural texts

FAM-LbR '10 Proceedings of the NAACL HLT 2010 First International Workshop on Formalisms and Methodology for Learning by Reading
Machine reading at the University of Washington

FAM-LbR '10 Proceedings of the NAACL HLT 2010 First International Workshop on Formalisms and Methodology for Learning by Reading
Unsupervised discovery of negative categories in lexicon bootstrapping

EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
Collective cross-document relation extraction without labelled data

EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
Joint training for open-domain extraction on the web: exploiting overlap when supervision is limited

Proceedings of the fourth ACM international conference on Web search and data mining
Scalable knowledge harvesting with high precision and high recall

Proceedings of the fourth ACM international conference on Web search and data mining
Semi-supervised semantic pattern discovery with guidance from unsupervised pattern clusters

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics: Posters
Collective Inference for Extraction MRFs Coupled with Symmetric Clique Potentials

The Journal of Machine Learning Research
An analysis of open information extraction based on semantic role labeling

Proceedings of the sixth international conference on Knowledge capture
Rich prior knowledge in learning for NLP

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Tutorial Abstracts of ACL 2011
Template-based information extraction without the templates

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
Insights from network structure for text mining

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
Relation guided bootstrapping of semantic lexicons

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: short papers - Volume 2
Customizing an information extraction system to a new domain

RELMS '11 Proceedings of the ACL 2011 Workshop on Relational Models of Semantics
Performing information extraction to improve OCR error detection in semi-structured historical documents

Proceedings of the 2011 Workshop on Historical Document Imaging and Processing
News information extraction based on adaptive weighting using unsupervised Bayesian algorithm

WISM'11 Proceedings of the 2011 international conference on Web information systems and mining - Volume Part II
Coreference aware web object retrieval

Proceedings of the 20th ACM international conference on Information and knowledge management
Interactive reasoning in uncertain RDF knowledge bases

Proceedings of the 20th ACM international conference on Information and knowledge management
WebSets: extracting sets of entities from the web using unsupervised information extraction

Proceedings of the fifth ACM international conference on Web search and data mining
Discovering relations between noun categories

EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Named entity recognition in tweets: an experimental study

EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Mining special features to improve the performance of e-commerce product selection and resume processing

International Journal of Computational Science and Engineering
Collective intelligence as a source for machine learning self-supervision

Proceedings of the 4th International Workshop on Web Intelligence & Communities
Open information extraction: the second generation

IJCAI'11 Proceedings of the Twenty-Second international joint conference on Artificial Intelligence - Volume Volume One
Enabling robots to find and fetch objects by querying the web

Proceedings of the 11th International Conference on Autonomous Agents and Multiagent Systems - Volume 3
Discovering and exploring relations on the web

Proceedings of the VLDB Endowment
User-driven relational models for entity-relation search and extraction

Proceedings of the 1st Joint International Workshop on Entity-Oriented and Semantic Search
Reducing wrong labels in distant supervision for relation extraction

ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers - Volume 1
Discriminative learning for joint template filling

ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers - Volume 1
Coupling label propagation and constraints for temporal fact extraction

ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Short Papers - Volume 2
Towards distributed MCMC inference in probabilistic knowledge bases

AKBC-WEKEX '12 Proceedings of the Joint Workshop on Automatic Knowledge Base Construction and Web-scale Knowledge Extraction
Collectively representing semi-structured data from the web

AKBC-WEKEX '12 Proceedings of the Joint Workshop on Automatic Knowledge Base Construction and Web-scale Knowledge Extraction
Adding distributional semantics to knowledge base entities through web-scale entity linking

AKBC-WEKEX '12 Proceedings of the Joint Workshop on Automatic Knowledge Base Construction and Web-scale Knowledge Extraction
Constructing a textual KB from a biology TextBook

AKBC-WEKEX '12 Proceedings of the Joint Workshop on Automatic Knowledge Base Construction and Web-scale Knowledge Extraction
Community-based classification of noun phrases in twitter

Proceedings of the 21st ACM international conference on Information and knowledge management
PRAVDA-live: interactive knowledge harvesting

Proceedings of the 21st ACM international conference on Information and knowledge management
Coupled bayesian sets algorithm for semi-supervised learning and information extraction

ECML PKDD'12 Proceedings of the 2012 European conference on Machine Learning and Knowledge Discovery in Databases - Volume Part II
MultiAspectForensics: mining large heterogeneous networks using tensor

International Journal of Web Engineering and Technology
Wikipedia entity expansion and attribute extraction from the web using semi-supervised learning

Proceedings of the sixth ACM international conference on Web search and data mining
Discovering unexpected information on the basis of popularity/unpopularity analysis of coordinate objects and their relationships

Proceedings of the 28th Annual ACM Symposium on Applied Computing
Inside YAGO2s: a transparent information extraction architecture

Proceedings of the 22nd international conference on World Wide Web companion
Autonomously reviewing and validating the knowledge base of a never-ending learning system

Proceedings of the 22nd international conference on World Wide Web companion
Discovering semantic relations from the web and organizing them with PATTY

ACM SIGMOD Record
Classifying entities into an incomplete ontology

Proceedings of the 2013 workshop on Automated knowledge base construction
A study of the knowledge base requirements for passing an elementary science test

Proceedings of the 2013 workshop on Automated knowledge base construction
Mining history with Le Monde

Proceedings of the 2013 workshop on Automated knowledge base construction
Universal schema for entity type prediction

Proceedings of the 2013 workshop on Automated knowledge base construction
Metadata Extraction from Books with Facts about Austria

Proceedings of International Conference on Information Integration and Web-based Applications & Services
Acquisition of open-domain classes via intersective semantics

Proceedings of the 23rd international conference on World wide web
Coupling as Strategy for Reducing Concept-Drift in Never-ending Learning Environments

Fundamenta Informaticae - Cognitive Informatics and Computational Intelligence: Theory and Applications

Quantified Score

Hi-index	0.00

Visualization

Abstract

We consider the problem of semi-supervised learning to extract categories (e.g., academic fields, athletes) and relations (e.g., PlaysSport(athlete, sport)) from web pages, starting with a handful of labeled training examples of each category or relation, plus hundreds of millions of unlabeled web documents. Semi-supervised training using only a few labeled examples is typically unreliable because the learning task is underconstrained. This paper pursues the thesis that much greater accuracy can be achieved by further constraining the learning task, by coupling the semi-supervised training of many extractors for different categories and relations. We characterize several ways in which the training of category and relation extractors can be coupled, and present experimental results demonstrating significantly improved accuracy as a result.