Empirical studies of software engineering: a roadmap
Proceedings of the Conference on The Future of Software Engineering
ACM Transactions on Information and System Security (TISSEC)
TPC-W: A Benchmark for E-Commerce
IEEE Internet Computing
Fast Similarity Search in the Presence of Noise, Scaling, and Translation in Time-Series Databases
VLDB '95 Proceedings of the 21th International Conference on Very Large Data Bases
Probabilistic discovery of time series motifs
Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Strategies for sound internet measurement
Proceedings of the 4th ACM SIGCOMM conference on Internet measurement
The evolution of FreeBSD and linux
Proceedings of the 2006 ACM/IEEE international symposium on Empirical software engineering
IEEE Transactions on Software Engineering
On challenges in evaluating malware clustering
RAID'10 Proceedings of the 13th international conference on Recent advances in intrusion detection
An experimentation workbench for replayable networking research
NSDI'07 Proceedings of the 4th USENIX conference on Networked systems design & implementation
IEEE Security and Privacy
Proceedings of the First Workshop on Building Analysis Datasets and Gathering Experience Returns for Security
Towards automatic software lineage inference
SEC'13 Proceedings of the 22nd USENIX conference on Security
Hi-index | 0.00 |
Rigorous experiments and empirical studies hold the promise of empowering researchers and practitioners to develop better approaches for cyber security. For example, understanding the provenance and lineage of polymorphic malware strains can lead to new techniques for detecting and classifying unknown attacks. Unfortunately, many challenges stand in the way: the lack of sufficient field data (e.g., malware samples and contextual information about their impact in the real world), the lack of metadata about the collection process of the existing data sets, the lack of ground truth, the difficulty of developing tools and methods for rigorous data analysis. As a first step towards rigorous experimental methods, we introduce two techniques for reconstructing the phylogenetic trees and dynamic control-flow graphs of unknown binaries, inspired from research in software evolution, bioinformatics and time series analysis. Our approach is based on the observation that the long evolution histories of open source projects provide an opportunity for creating precise models of lineage and provenance, which can be used for detecting and clustering malware as well. As a second step, we present experimental methods that combine the use of a representative corpus of malware and contextual information (gathered from end hosts rather than from network traces or honeypots) with sound data collection and analysis techniques. While our experimental methods serve a concrete purpose-- understanding lineage and provenance--they also provide a general blueprint for addressing the threats to the validity of cyber security studies.