A sequential algorithm for training text classifiers
SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
Combining labeled and unlabeled data with co-training
COLT' 98 Proceedings of the eleventh annual conference on Computational learning theory
Active + Semi-supervised Learning = Robust Multi-View Learning
ICML '02 Proceedings of the Nineteenth International Conference on Machine Learning
Email classification with co-training
CASCON '01 Proceedings of the 2001 conference of the Centre for Advanced Studies on Collaborative research
Intrusion detection using an ensemble of intelligent paradigms
Journal of Network and Computer Applications - Special issue on computational intelligence on the internet
Applying co-training methods to statistical parsing
NAACL '01 Proceedings of the second meeting of the North American Chapter of the Association for Computational Linguistics on Language technologies
Weakly supervised natural language learning without redundant views
NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1
Bootstrapping POS taggers using unlabelled data
CONLL '03 Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003 - Volume 4
Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)
Comparative Study of Supervised Machine Learning Techniques for Intrusion Detection
CNSR '07 Proceedings of the Fifth Annual Conference on Communication Networks and Services Research
An overview of anomaly detection techniques: Existing solutions and latest technological trends
Computer Networks: The International Journal of Computer and Telecommunications Networking
Offline/realtime traffic classification using semi-supervised learning
Performance Evaluation
Semi-supervised learning for false alarm reduction
ICDM'10 Proceedings of the 10th industrial conference on Advances in data mining: applications and theoretical aspects
A misleading attack against semi-supervised learning for intrusion detection
MDAI'10 Proceedings of the 7th international conference on Modeling decisions for artificial intelligence
An effective procedure exploiting unlabeled data to build monitoring system
Expert Systems with Applications: An International Journal
Proceedings of the 5th ACM workshop on Security and artificial intelligence
CSS'12 Proceedings of the 4th international conference on Cyberspace Safety and Security
Toward supervised anomaly detection
Journal of Artificial Intelligence Research
Hi-index | 0.00 |
Although there is immense data available from networks and hosts, a very small proportion of this data is labeled due to the cost of obtaining expert labels. This proves to be a significant bottle-neck for developing supervised intrusion detection systems that rely solely on labeled data. In spite of the data being collected from real network environments and hence potentially holding valuable information for intrusion detection, such systems can not exploit the remaining unlabeled data. In this work, we intelligently leverage both labeled and unlabeled data. Also, intrusion detection tasks naturally lend themselves into a multi-view scenario, and can benefit significantly if these multiple views are combined meaningfully. In this paper, we propose a co-training method framework for intrusion detection, which is a semi-supervised learning method and can not only utilize unlabeled data, but can also combine multi-view data. We also employ an active learning framework where statistically ambiguous parts of the unlabeled data are identified, which can then be labeled by an expert. This allows for minimal expert labeling while ensuring that the labels obtained from the expert are most informative. In our experiments, we demonstrate that leveraging the unlabeled data using our proposed method significantly reduces the error rate as compared to using the labeled data alone. In addition, our proposed multi-view method has a lower error rate than using a single view.