Data mining: practical machine learning tools and techniques with Java implementations
Data mining: practical machine learning tools and techniques with Java implementations
BoosTexter: A Boosting-based Systemfor Text Categorization
Machine Learning - Special issue on information retrieval
A decision-theoretic generalization of on-line learning and an application to boosting
EuroCOLT '95 Proceedings of the Second European Conference on Computational Learning Theory
Pattern Classification (2nd Edition)
Pattern Classification (2nd Edition)
Accurate, scalable in-network identification of p2p traffic using application signatures
Proceedings of the 13th international conference on World Wide Web
In Defense of One-Vs-All Classification
The Journal of Machine Learning Research
A maximum entropy approach to species distribution modeling
ICML '04 Proceedings of the twenty-first international conference on Machine learning
Transport layer identification of P2P traffic
Proceedings of the 4th ACM SIGCOMM conference on Internet measurement
Internet traffic classification using bayesian analysis techniques
SIGMETRICS '05 Proceedings of the 2005 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Profiling internet backbone traffic: behavior models and applications
Proceedings of the 2005 conference on Applications, technologies, architectures, and protocols for computer communications
BLINC: multilevel traffic classification in the dark
Proceedings of the 2005 conference on Applications, technologies, architectures, and protocols for computer communications
ACAS: automated construction of application signatures
Proceedings of the 2005 ACM SIGCOMM workshop on Mining network data
ACM SIGCOMM Computer Communication Review
Unexpected means of protocol inference
Proceedings of the 6th ACM SIGCOMM conference on Internet measurement
Traffic classification through simple statistical fingerprinting
ACM SIGCOMM Computer Communication Review
Offline/realtime traffic classification using semi-supervised learning
Performance Evaluation
Network monitoring using traffic dispersion graphs (tdgs)
Proceedings of the 7th ACM SIGCOMM conference on Internet measurement
Lightweight application classification for network management
Proceedings of the 2007 SIGCOMM workshop on Internet network management
Performance analysis of the ANGEL system for automated control of game traffic prioritisation
Proceedings of the 6th ACM SIGCOMM workshop on Network and system support for games
An Examination of Experimental Methodology for Classifiers of Relational Data
ICDMW '07 Proceedings of the Seventh IEEE International Conference on Data Mining Workshops
Adaptive mixtures of local experts
Neural Computation
Early application identification
CoNEXT '06 Proceedings of the 2006 ACM CoNEXT conference
Unconstrained endpoint profiling (googling the internet)
Proceedings of the ACM SIGCOMM 2008 conference on Data communication
Proceedings of the eleventh international joint conference on Measurement and modeling of computer systems
Exploiting dynamicity in graph-based traffic analysis: techniques and applications
Proceedings of the 5th international conference on Emerging networking experiments and technologies
Graph-based P2P traffic classification at the internet backbone
INFOCOM'09 Proceedings of the 28th IEEE international conference on Computer Communications Workshops
Inferring applications at the network layer using collective traffic statistics
Proceedings of the ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Tracking long duration flows in network traffic
INFOCOM'10 Proceedings of the 29th conference on Information communications
Characterizing data usage patterns in a large cellular network
Proceedings of the 2012 ACM SIGCOMM workshop on Cellular networks: operations, challenges, and future design
Application traffic classification at the early stage by characterizing application rounds
Information Sciences: an International Journal
Robust network traffic identification with unknown applications
Proceedings of the 8th ACM SIGSAC symposium on Information, computer and communications security
Traffic classification combining flow correlation and ensemble classifier
International Journal of Wireless and Mobile Computing
Hi-index | 0.00 |
The ability to accurately and scalably classify network traffic is of critical importance to a wide range of management tasks of large networks, such as tier-1 ISP networks and global enterprise networks. Guided by the practical constraints and requirements of traffic classification in large networks, in this article, we explore the design of an accurate and scalable machine learning based flow-level traffic classification system, which is trained on a dataset of flow-level data that has been annotated with application protocol labels by a packet-level classifier. Our system employs a lightweight modular architecture, which combines a series of simple linear binary classifiers, each of which can be efficiently implemented and trained on vast amounts of flow data in parallel, and embraces three key innovative mechanisms, weighted threshold sampling, logistic calibration, and intelligent data partitioning, to achieve scalability while attaining high accuracy. Evaluations using real traffic data from multiple locations in a large ISP show that our system accurately reproduces the labels of the packet level classifier when runs on (unlabeled) flow records, while meeting the scalability and stability requirements of large ISP networks. Using training and test datasets that are two months apart and collected from two different locations, the flow error rates are only 3% for TCP flows and 0.4% for UDP flows. We further show that such error rates can be reduced by combining the information of spatial distributions of flows, or collective traffic statistics, during classification. We propose a novel two-step model, which seamlessly integrates these collective traffic statistics into the existing traffic classification system. Experimental results display performance improvement on all traffic classes and an overall error rate reduction by 15%. In addition to a high accuracy, at runtime, our implementation easily scales to classify traffic on 10Gbps links.