Combining labelled and unlabelled data: a case study on fisher kernels and transductive inference for biological entity recognition

Authors:
Cyril Goutte;Hervé Déjean;Eric Gaussier;Nicola Cancedda;Jean-Michel Renders
Affiliations:
Xerox Research Center Europe, Meylan, France;Xerox Research Center Europe, Meylan, France;Xerox Research Center Europe, Meylan, France;Xerox Research Center Europe, Meylan, France;Xerox Research Center Europe, Meylan, France
Venue:
COLING-02 proceedings of the 6th conference on Natural language learning - Volume 20
Year:
2002

Citing 10
Cited 1

A re-examination of text categorization methods

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Exploiting generative models in discriminative classifiers

Proceedings of the 1998 conference on Advances in neural information processing systems II
Text classification in a hierarchical mixture model for small training sets

Proceedings of the tenth international conference on Information and knowledge management
A Hierarchical Model for Clustering and Categorising Documents

Proceedings of the 24th BCS-IRSG European Colloquium on IR Research: Advances in Information Retrieval
Transductive Inference for Text Classification using Support Vector Machines

ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
A Pragmatic Information Extraction Strategy for Gathering Data on Genetic Interactions

Proceedings of the Eighth International Conference on Intelligent Systems for Molecular Biology
Automatic Construction of Knowledge Base from Biological Papers

Proceedings of the 5th International Conference on Intelligent Systems for Molecular Biology
Constructing Biological Knowledge Bases by Extracting Information from Text Sources

Proceedings of the Seventh International Conference on Intelligent Systems for Molecular Biology
Probabilistic latent semantic analysis

UAI'99 Proceedings of the Fifteenth conference on Uncertainty in artificial intelligence
Learning by transduction

UAI'98 Proceedings of the Fourteenth conference on Uncertainty in artificial intelligence

Semi-supervised ranking for document retrieval

Computer Speech and Language

Quantified Score

Hi-index	0.00

Visualization

Abstract

We address the problem of using partially labelled data, eg large collections were only little data is annotated, for extracting biological entities. Our approach relies on a combination of probabilistic models, which we use to model the generation of entities and their context, and kernel machines, which implement powerful categorisers based on a similarity measure and some labelled data. This combination takes the form of the so-called Fisher kernels which implement a similarity based on an underlying probabilistic model. Such kernels are compared with transductive inference, an alternative approach to combining labelled and unlabelled data, again coupled with Support Vector Machines. Experiments are performed on a database of abstracts extracted from Medline.