blog
November 15, 2010
Another trick that is becoming more and more common in malicious PDF files consists of storing the actual malicious content (for example, JavaScript code that exploits some vulnerability) into XFA forms. If you remember the getPageNthWord, getAnnots, and the info tricks that have been documented earlier, you will recognize the technique been used here.
So, what is an XFA form? XFA stands for XML Forms Architecture and it is a specification used to create form templates (forms that can be filled in by a user) and to process them (for example, validate their contents). Support for XFA forms in PDF files has been introduced by Adobe with PDF 1.5. If you want to know all the gory details, you can refer to the original XFA proposal or to the Adobe's XFA specification, which, however, being 1123-page long may be a hard read.
Let's see how it used abused in practice (the MD5 of the sample I'm analyzing is 1f26dcd4520a6965a42cefa4c7641334). The PDF first defines an XFA template, which is used to describe the appearance and interactive characteristics of the form.
obj 10 0
<<
/Type /EmbeddedFile
/Length 618
/Filter /FlateDecode
>>
stream
<template xmlns="http://www.xfa.org/schema/xfa-template/2.5/">
<subform layout="tb" locale="en_US" name="artsLei">
<pageSet>
<pageArea id="leiArts" name="leiArts">
<contentArea h="756pt" w="576pt" x="0.25in" y="0.25in"/>
<medium long="792pt" short="612pt" stock="default"/>
</pageArea>
</pageSet>
<subform h="756pt" w="576pt " name="docTaut">
<field h="65mm" name="docArts" w="85mm" x="53.6501mm" y="88.649 9mm">
<event activity="initialize" name="tautDoc">
<script contentType="application/x-javascript">
var nil = (function(){return this;}).call(null);
...
eval_ref(decode(docArts[\'ra\'+ue+\'wVa\'+ue+\' lue\'].substring(50),eval_ref));
</script>
</event>
<ui><imageEdit/></ui>
</field>
</subform>
</subform>
</template>
endstream
endobj
A couple of interesting parts: the template defines a field, named
docArts. Note that a reference to this field will be available through
an object named docArts in the global scope of JavaScript (i.e.,
this.docArts is a Field object that represents this field).
The field also has an event handler to handle its initialization. The
handler is written in JavaScript and has the familiar aspect of
obfuscated code.
Let's see what this code does:
var nil = (function(){return this;}).call(null);
var eval_ref = nil['eval'];
function decode(str, ev){
var ret = '';
var cvc = [];
var fcc = String.fromCharCode;
var k = docArts['rawValue'].substring(0, 50);
...
return ret;
}
eval_ref(decode(docArts['rawValue'].substring(50), eval_ref));
The interesting bits here are the references to the docArts object.
Notice that its rawValue property is retrieved. So, where is the value
of the field stored? In an XFA dataset:
obj 12 0
<<
/Filter /FlateDecode
/Length 3388
/Type /EmbeddedFile
>>
stream
<xfa:datasets xmlns:xfa="http://www.xfa.org/schema/xfa-data/1.0/">
<xfa:data>
<artsLei>
<docArts>
[[32,48],[65,97],[48,64],[10,11],[13,14],[97,126]]
[80,87,70,83,71,77,80,88,16,
...
78,66,74,79,21,86,79,68,8,9,59]
</docArts>
</artsLei>
</xfa:data>
</xfa:datasets>
endstream
endobj
Therefore, the obfuscated JavaScript extracts the data stored for the docArts field (precisely, all the content after the initial 50 characters) and passes it for decoding to the decoding routine. The decoding routine also uses the docArts data (the first 50 characters) to retrieve the malicious code in the clear, which is ready to be evaluated. The execution finally results with an exploitation of the CVE-2010-0188 vulnerability (libTiff overflow).
To leave a comment, complete the form below. Mandatory fields are marked *.