Improving malicious URL re-evaluation scheduling through an empirical study of malware download centers

Authors:
Kyle Zeeuwen;Matei Ripeanu;Konstantin Beznosov
Affiliations:
University of British Columbia;University of British Columbia, Vancouver, Canada;University of British Columbia, Vancouver, Canada
Venue:
Proceedings of the 2011 Joint WICOW/AIRWeb Workshop on Web Quality
Year:
2011

Citing 7
Cited 0

PlanetLab: an overlay testbed for broad-coverage services

ACM SIGCOMM Computer Communication Review
Detecting semantic cloaking on the web

Proceedings of the 15th international conference on World Wide Web
An inquiry into the nature and causes of the wealth of internet miscreants

Proceedings of the 14th ACM conference on Computer and communications security
The ghost in the browser analysis of web-based malware

HotBots'07 Proceedings of the first conference on First Workshop on Hot Topics in Understanding Botnets
Spamalytics: an empirical analysis of spam marketing conversion

Proceedings of the 15th ACM conference on Computer and communications security
All your iFRAMEs point to Us

SS'08 Proceedings of the 17th conference on Security symposium
Re: CAPTCHAs: understanding CAPTCHA-solving services in an economic context

USENIX Security'10 Proceedings of the 19th USENIX conference on Security

Quantified Score

Hi-index	0.00

Visualization

Abstract

The retrieval and analysis of malicious content is an essential task for security researchers. At the same time, the distributors of malicious files deploy countermeasures to evade the scrutiny of security researchers. This paper investigates two techniques used by malware download centers: frequently updating the malicious payload, and blacklisting (i.e., refusing HTTP requests from researchers based on their IP). To this end, we sent HTTP requests to malware download centers over a period of four months. The requests are distributed across two pools of IPs, one exhibiting high volume research behaviour and another exhibiting semi-random, low volume behaviour. We identify several distinct update patterns, including sites that do not update the binary at all, sites that update the binary for each new client but then repeatedly serve a specific binary to the same client, sites that periodically update the binary with periods ranging from one hour to 84 days, and server-side polymorphic sites, that deliver new binaries for each HTTP request. From this classification we identify several guidelines for crawlers that re-query malware download centers looking for binary updates. We propose a scheduling algorithm that incorporates these guidelines, and perform a limited evaluation of the algorithm using the data we collected. We analyze our data for evidence of blacklisting and find strong evidence that a small minority of URLs blacklisted our high volume IPs, but for the majority of malicious URLs studied, there was no observable blacklisting response, despite issuing over over 1.5 million requests to 5001 different malware download centers.