Email classification with co-training

Authors:
Svetlana Kiritchenko;Stan Matwin
Affiliations:
School of Information Technology and Engineering, University of Ottawa, Ottawa, ON, Canada;School of Information Technology and Engineering, University of Ottawa, Ottawa, ON, Canada
Venue:
CASCON '01 Proceedings of the 2001 conference of the Centre for Advanced Studies on Collaborative research
Year:
2001

Citing 13
Cited 36

A theory of the learnable

Communications of the ACM
The nature of statistical learning theory

The nature of statistical learning theory
Combining labeled and unlabeled data with co-training

COLT' 98 Proceedings of the eleventh annual conference on Computational learning theory
Concept features in Re:Agent, an intelligent Email agent

AGENTS '98 Proceedings of the second international conference on Autonomous agents
MailCat: an intelligent assistant for organizing e-mail

AAAI '99/IAAI '99 Proceedings of the sixteenth national conference on Artificial intelligence and the eleventh Innovative applications of artificial intelligence conference innovative applications of artificial intelligence
Data mining: practical machine learning tools and techniques with Java implementations

Data mining: practical machine learning tools and techniques with Java implementations
Learning to construct knowledge bases from the World Wide Web

Artificial Intelligence - Special issue on Intelligent internet systems
Analyzing the effectiveness and applicability of co-training

Proceedings of the ninth international conference on Information and knowledge management
Text Categorization with Suport Vector Machines: Learning with Many Relevant Features

ECML '98 Proceedings of the 10th European Conference on Machine Learning
Transductive Inference for Text Classification using Support Vector Machines

ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
Challenges of the Email Domain for Text Classification

ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Improving Short-Text Classification using Unlabeled Data for Classification Problems

ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Estimating continuous distributions in Bayesian classifiers

UAI'95 Proceedings of the Eleventh conference on Uncertainty in artificial intelligence

TaskView: design and evaluation of a task-based email interface

CASCON '02 Proceedings of the 2002 conference of the Centre for Advanced Studies on Collaborative research
Co-training with a Single Natural Feature Set Applied to Email Classification

WI '04 Proceedings of the 2004 IEEE/WIC/ACM International Conference on Web Intelligence
Combining email models for false positive reduction

Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
Automatically classifying emails into activities

Proceedings of the 11th international conference on Intelligent user interfaces
SF-HME system: a hierarchical mixtures-of-experts classification system for spam filtering

Proceedings of the 2006 ACM symposium on Applied computing
Learning to classify e-mail

Information Sciences: an International Journal
An adaptive fuzzy c-means clustering-based mixtures of experts model for unlabeled data classification

Neurocomputing
Watch, Listen & Learn: Co-training on Captioned Images and Videos

ECML PKDD '08 Proceedings of the 2008 European Conference on Machine Learning and Knowledge Discovery in Databases - Part I
k-NN Aggregation with a Stacked Email Representation

ECCBR '08 Proceedings of the 9th European conference on Advances in Case-Based Reasoning
A bottom-up approach for XML documents classification

IDEAS '08 Proceedings of the 2008 international symposium on Database engineering & applications
Applying effective feature selection techniques with hierarchical mixtures of experts for spam classification

Journal of Computer Security
Learning from Skewed Class Multi-relational Databases

Fundamenta Informaticae - Progress on Multi-Relational Data Mining
Semi-supervised co-training and active learning based approach for multi-view intrusion detection

Proceedings of the 2009 ACM symposium on Applied Computing
Applying effective feature selection techniques with hierarchical mixtures of experts for spam classification

Journal of Computer Security - Best papers of the Sec Track at the 2006 ACM Symposium
Interacting meaningfully with machine learning systems: Three experiments

International Journal of Human-Computer Studies
Temporal and information flow based event detection from social text streams

AAAI'07 Proceedings of the 22nd national conference on Artificial intelligence - Volume 2
Exposing parameters of a trained dynamic model for interactive music creation

AAAI'08 Proceedings of the 23rd national conference on Artificial intelligence - Volume 2
Co-training for cross-lingual sentiment classification

ACL '09 Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 1 - Volume 1
Multi-view semi-supervised learning for dialog act segmentation of speech

IEEE Transactions on Audio, Speech, and Language Processing
2010 Special Issue: Semi-supervised learning for tree-structured ensembles of RBF networks with Co-Training

Neural Networks
Co-training with relevant random subspaces

Neurocomputing
Combining committee-based semi-supervised learning and active learning

Journal of Computer Science and Technology
Automatically tagging email by leveraging other users' folders

Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
Global/local hybrid learning of mixture-of-experts from labeled and unlabeled data

HAIS'11 Proceedings of the 6th international conference on Hybrid artificial intelligent systems - Volume Part I
Bilingual co-training for sentiment classification of chinese product reviews

Computational Linguistics
A machine learning approach to identifying database sessions using unlabeled data

DaWaK'05 Proceedings of the 7th international conference on Data Warehousing and Knowledge Discovery
Classifying e-mails via support vector machine

WAIM '06 Proceedings of the 7th international conference on Advances in Web-Age Information Management
New feature splitting criteria for co-training using genetic algorithm optimization

MCS'10 Proceedings of the 9th international conference on Multiple Classifier Systems
PERC: a personal email classifier

ECIR'06 Proceedings of the 28th European conference on Advances in Information Retrieval
Web classification of conceptual entities using co-training

Expert Systems with Applications: An International Journal
Transductive relational classification in the co-training paradigm

MLDM'12 Proceedings of the 8th international conference on Machine Learning and Data Mining in Pattern Recognition
Learning from Skewed Class Multi-relational Databases

Fundamenta Informaticae - Progress on Multi-Relational Data Mining
EmailValet: managing email overload through private, accountable crowdsourcing

Proceedings of the 2013 conference on Computer supported cooperative work
A comparative study on feature selection and adaptive strategies for email foldering using the ABC-DynF framework

Knowledge-Based Systems
An Embedded Co-AdaBoost based construction of software document relation coupled resource spaces for cyber-physical society

Future Generation Computer Systems
Pattern classification and clustering: A review of partially supervised learning approaches

Pattern Recognition Letters

Quantified Score

Hi-index	0.00

Visualization

Abstract

The main problems in text classification are lack of labeled data, as well as the cost of labeling the unlabeled data. We address these problems by exploring co-training - an algorithm that uses unlabeled data along with a few labeled examples to boost the performance of a classifier. We experiment with co-training on the email domain. Our results show that the performance of co-training depends on the learning algorithm it uses. In particular, Support Vector Machines significantly outperforms Naive Bayes on email classification.