Cross-training: learning probabilistic mappings between topics

Authors:
Sunita Sarawagi;Soumen Chakrabarti;Shantanu Godbole
Affiliations:
IIT Bombay;IIT Bombay;IIT Bombay
Venue:
Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Year:
2003

Citing 13
Cited 16

Automatically organizing bookmarks per contents

Proceedings of the fifth international World Wide Web conference on Computer networks and ISDN systems
Multitask Learning

Machine Learning - Special issue on inductive transfer
Combining labeled and unlabeled data with co-training

COLT' 98 Proceedings of the eleventh annual conference on Computational learning theory
Inductive learning algorithms and representations for text categorization

Proceedings of the seventh international conference on Information and knowledge management
Making large-scale support vector machine learning practical

Advances in kernel methods
A re-examination of text categorization methods

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
PowerBookmarks: a system for personalizable Web information organization, sharing, and management

WWW '99 Proceedings of the eighth international conference on World Wide Web
Text Classification from Labeled and Unlabeled Documents using EM

Machine Learning - Special issue on information retrieval
On integrating catalogs

Proceedings of the 10th international conference on World Wide Web
Learning to map between ontologies on the semantic web

Proceedings of the 11th international conference on World Wide Web
Partially Supervised Classification of Text Documents

ICML '02 Proceedings of the Nineteenth International Conference on Machine Learning
PEBL: positive example based learning for Web page classification using SVM

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
A model of inductive bias learning

Journal of Artificial Intelligence Research

A cross-collection mixture model for comparative text mining

Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
On hierarchical web catalog integration with conceptual relationships in thesaurus

SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Web Directory Integration Using Conditional Random Fields

WI '06 Proceedings of the 2006 IEEE/WIC/ACM International Conference on Web Intelligence
Web taxonomy integration with hierarchical shrinkage algorithm and fine-grained relations

Expert Systems with Applications: An International Journal
Structured entity identification and document categorization: two tasks with one joint model

Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Matching Hierarchies Using Shared Objects

ECDL '08 Proceedings of the 12th European conference on Research and Advanced Technology for Digital Libraries
Harvesting Regional Transliteration Variants with Guided Search

ICCPOL '09 Proceedings of the 22nd International Conference on Computer Processing of Oriental Languages. Language Technology for the Knowledge-based Economy
A maximum likelihood framework for integrating taxonomies

AAAI'05 Proceedings of the 20th national conference on Artificial intelligence - Volume 2
Learning to integrate web taxonomies

Web Semantics: Science, Services and Agents on the World Wide Web
Improving hierarchical taxonomy integration with semantic feature expansion on category-specific terms

AIRS'08 Proceedings of the 4th Asia information retrieval conference on Information retrieval technology
Learning regional transliteration variants

Information Processing and Management: an International Journal
An iterative approach for web catalog integration with support vector machines

AIRS'05 Proceedings of the Second Asia conference on Asia Information Retrieval Technology
Learning to separate text content and style for classification

AIRS'06 Proceedings of the Third Asia conference on Information Retrieval Technology
Learning to integrate web catalogs with conceptual relationships in hierarchical thesaurus

AIRS'06 Proceedings of the Third Asia conference on Information Retrieval Technology
A cross-lingual framework for web news taxonomy integration

AIRS'06 Proceedings of the Third Asia conference on Information Retrieval Technology
Learning from positive and unlabeled examples with different data distributions

ECML'05 Proceedings of the 16th European conference on Machine Learning

Quantified Score

Hi-index	0.00

Visualization

Abstract

Classification is a well-established operation in text mining. Given a set of labels A and a set DA of training documents tagged with these labels, a classifier learns to assign labels to unlabeled test documents. Suppose we also had available a different set of labels B, together with a set of documents DB marked with labels from B. If A and B have some semantic overlap, can the availability of DB help us build a better classifier for A, and vice versa? We answer this question in the affirmative by proposing cross-training: a new approach to semi-supervised learning in presence of multiple label sets. We give distributional and discriminative algorithms for cross-training and show, through extensive experiments, that cross-training can discover and exploit probabilistic relations between two taxonomies for more accurate classification.