Semi-supervised co-training and active learning based approach for multi-view intrusion detection

Authors:
Ching-Hao Mao;Hahn-Ming Lee;Devi Parikh;Tsuhan Chen;Si-Yu Huang
Affiliations:
National Taiwan University of Science and Technology, Taipei, Taiwan;National Taiwan University of Science and Technology, Taipei, Taiwan and Academia Sinica, Taipei, Taiwan;Carnegie Mellon University, Pittsburgh, Pennsylvania;Carnegie Mellon University, Pittsburgh, Pennsylvania;National Taiwan University of Science and Technology, Taipei, Taiwan
Venue:
Proceedings of the 2009 ACM symposium on Applied Computing
Year:
2009

Citing 12
Cited 7

A sequential algorithm for training text classifiers

SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
Combining labeled and unlabeled data with co-training

COLT' 98 Proceedings of the eleventh annual conference on Computational learning theory
Active + Semi-supervised Learning = Robust Multi-View Learning

ICML '02 Proceedings of the Nineteenth International Conference on Machine Learning
Email classification with co-training

CASCON '01 Proceedings of the 2001 conference of the Centre for Advanced Studies on Collaborative research
Intrusion detection using an ensemble of intelligent paradigms

Journal of Network and Computer Applications - Special issue on computational intelligence on the internet
Applying co-training methods to statistical parsing

NAACL '01 Proceedings of the second meeting of the North American Chapter of the Association for Computational Linguistics on Language technologies
Weakly supervised natural language learning without redundant views

NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1
Bootstrapping POS taggers using unlabelled data

CONLL '03 Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003 - Volume 4
Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)

Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)
Comparative Study of Supervised Machine Learning Techniques for Intrusion Detection

CNSR '07 Proceedings of the Fifth Annual Conference on Communication Networks and Services Research
An overview of anomaly detection techniques: Existing solutions and latest technological trends

Computer Networks: The International Journal of Computer and Telecommunications Networking
Offline/realtime traffic classification using semi-supervised learning

Performance Evaluation

Semi-supervised learning for false alarm reduction

ICDM'10 Proceedings of the 10th industrial conference on Advances in data mining: applications and theoretical aspects
A misleading attack against semi-supervised learning for intrusion detection

MDAI'10 Proceedings of the 7th international conference on Modeling decisions for artificial intelligence
An effective procedure exploiting unlabeled data to build monitoring system

Expert Systems with Applications: An International Journal
Nonparametric semi-supervised learning for network intrusion detection: combining performance improvements with realistic in-situ training

Proceedings of the 5th ACM workshop on Security and artificial intelligence
Intrusion detection using disagreement-based semi-supervised learning: detection enhancement and false alarm reduction

CSS'12 Proceedings of the 4th international conference on Cyberspace Safety and Security
Toward supervised anomaly detection

Journal of Artificial Intelligence Research
Network anomaly detection with the restricted Boltzmann machine

Neurocomputing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Although there is immense data available from networks and hosts, a very small proportion of this data is labeled due to the cost of obtaining expert labels. This proves to be a significant bottle-neck for developing supervised intrusion detection systems that rely solely on labeled data. In spite of the data being collected from real network environments and hence potentially holding valuable information for intrusion detection, such systems can not exploit the remaining unlabeled data. In this work, we intelligently leverage both labeled and unlabeled data. Also, intrusion detection tasks naturally lend themselves into a multi-view scenario, and can benefit significantly if these multiple views are combined meaningfully. In this paper, we propose a co-training method framework for intrusion detection, which is a semi-supervised learning method and can not only utilize unlabeled data, but can also combine multi-view data. We also employ an active learning framework where statistically ambiguous parts of the unlabeled data are identified, which can then be labeled by an expert. This allows for minimal expert labeling while ensuring that the labels obtained from the expert are most informative. In our experiments, we demonstrate that leveraging the unlabeled data using our proposed method significantly reduces the error rate as compared to using the labeled data alone. In addition, our proposed multi-view method has a lower error rate than using a single view.