Mining association rules between sets of items in large databases
SIGMOD '93 Proceedings of the 1993 ACM SIGMOD international conference on Management of data
SPARTAN: a model-based semantic compression system for massive data tables
SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
The UCI KDD archive of large data sets for data mining research and experimentation
ACM SIGKDD Explorations Newsletter - Special issue on “Scalable data mining algorithms”
ECML '95 Proceedings of the 8th European Conference on Machine Learning
Semantic Compression and Pattern Extraction with Fascicles
VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
Internet growth: is there a "Moore's law" for data traffic?
Handbook of massive data sets
Principles for mining summaries using objective measures of interestingness
ICTAI '00 Proceedings of the 12th IEEE International Conference on Tools with Artificial Intelligence
ItCompress: An Iterative Semantic Compression Algorithm
ICDE '04 Proceedings of the 20th International Conference on Data Engineering
Intelligent web traffic mining and analysis
Journal of Network and Computer Applications - Special issue on computational intelligence on the internet
Internet traffic classification using bayesian analysis techniques
SIGMETRICS '05 Proceedings of the 2005 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
A robust system for accurate real-time summaries of internet traffic
SIGMETRICS '05 Proceedings of the 2005 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
An improved data stream summary: the count-min sketch and its applications
Journal of Algorithms
Unsupervised anomaly detection in network intrusion detection using clusters
ACSC '05 Proceedings of the Twenty-eighth Australasian conference on Computer Science - Volume 38
General purpose database summarization
VLDB '05 Proceedings of the 31st international conference on Very large data bases
Scalable Model-Based Clustering for Large Databases Based on Data Summarization
IEEE Transactions on Pattern Analysis and Machine Intelligence
A survey of interestingness measures for knowledge discovery
The Knowledge Engineering Review
Summarization — Compressing Data into an Informative Representation
ICDM '05 Proceedings of the Fifth IEEE International Conference on Data Mining
On Mining Summaries by Objective Measures of Interestingness
Machine Learning
Interestingness measures for data mining: A survey
ACM Computing Surveys (CSUR)
Data warehousing and data mining techniques for intrusion detection systems
Distributed and Parallel Databases
Conceptual approaches for defining data, information, and knowledge: Research Articles
Journal of the American Society for Information Science and Technology
Summarization – compressing data into an informative representation
Knowledge and Information Systems
Network Traffic Classification Using K-means Clustering
IMSCCS '07 Proceedings of the Second International Multi-Symposiums on Computer and Computational Sciences
Multiplicative synopses for relative-error metrics
Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology
Efficient application identification and the temporal and spatial stability of classification schema
Computer Networks: The International Journal of Computer and Telecommunications Networking
Entropy based adaptive flow aggregation
IEEE/ACM Transactions on Networking (TON)
Intrusion Detection Method Based on Classify Support Vector Machine
ICICTA '09 Proceedings of the 2009 Second International Conference on Intelligent Computation Technology and Automation - Volume 02
Anomaly extraction in backbone networks using association rules
Proceedings of the 9th ACM SIGCOMM conference on Internet measurement conference
Data clustering: 50 years beyond K-means
Pattern Recognition Letters
Summarising data by clustering items
ECML PKDD'10 Proceedings of the 2010 European conference on Machine learning and knowledge discovery in databases: Part II
Journal of Network and Computer Applications
A framework for mining interesting pattern sets
ACM SIGKDD Explorations Newsletter
Krimp: mining itemsets that compress
Data Mining and Knowledge Discovery
Anomaly detection in wireless sensor networks: A survey
Journal of Network and Computer Applications
Summarizing transactional databases with overlapped hyperrectangles
Data Mining and Knowledge Discovery
Monitoring abnormal network traffic based on blind source separation approach
Journal of Network and Computer Applications
Intelligent rate control for supporting real-time traffic in WLAN mesh networks
Journal of Network and Computer Applications
Bayesian Neural Networks for Internet Traffic Classification
IEEE Transactions on Neural Networks
Hi-index | 0.00 |
Network traffic monitoring is a very difficult task, given the amount of network traffic generated even in small networks. One approach to facilitate this task is network traffic summarization. Data summarization is a key concept in data mining. However, no current measures exist in order to facilitate the evaluation of summaries. This paper presents four metrics which can be used to characterize data summarization results. Conciseness and Information Loss have already been defined, but we modified Information Loss, due to the fact that it was biased towards recurring attributes across individual summaries. We also propose two additional metrics, Interestingness and Intelligibility. Using the proposed metrics, we evaluated existing summarization techniques on well known network traffic datasets. We also proposed a summarization technique, based on an existing one but incorporating the proposed metrics as objective function. In order to further demonstrate the usability of the metrics, we performed classification on summarized datasets, showing that the metrics can be used to facilitate the selection of summaries for performing data mining. Using the summarized datasets with a reasonable conciseness, we were able to achieve similar results in terms of accuracy, but at a fraction of the running time, proportional to the conciseness of the summarized dataset.