As browser-based exploits and specifically JavaScript malware have shouldered their way to the top of the list of threats, browser vendors have been scrambling to find effective defenses to protect users. Few have been forthcoming, but Microsoft Research has developed a new tool called Zozzle that can be deployed in the browser and can detect JavaScript-based malware at a very high effectiveness rate.

Zozzle is designed to perform static analysis of JavaScript code on a given site and quickly determine whether the code is malicious and includes an exploit. In order to be effective, the tool must be trained to recognize the elements that are common to malicious JavaScript, and the researchers behind it stress that it works best on de-obfuscated code. In the paper, the researchers say that they trained Zozzle by crawling millions of Web sites and using a similar tool, called Nozzle, to process the URLs and see whether malware was present.

"ZOZZLE makes use of a statistical classifier to efficiently identify malicious JavaScript. The classifier needs training data to accurately classify JavaScript source, and we describe
the process we use to get that training data here. We start by augmenting the JavaScript engine in a browser with a “deobfuscator” that extracts and collects individual fragments
of JavaScript. As discussed above, exploits are frequently buried under multiple levels of JavaScript eval. Unlike Nozzle, which observes the behavior of running JavaScript code,
ZOZZLE must be run on an unobfuscated exploit to reliably detect malicious code," the researchers wrote in a paper written on Zozzle by Benjamin Livshits and Benjamin Zorn of Microsoft Research, Christian Seifert of Microsoft and Charles Curtsinger of the University of Massachusetts at Amherst.

The researchers say that Zozzle is specifically designed to detect and defend against heap-spraying exploits launched by malicious JavaScript found on Web sites. In many cases these days, that kind of exploit is hosted on a legitimate site that's been compromised and is being used as part of a drive-by download attack. Often, the code is hosted on a specific page for a day or even a few hours and then is taken down, either by the attacker or the site owner. The Microsoft researchers say that this, along with the multiple layers of obfuscation that attackers use to cloak JavaScript exploits, can make it difficult for automated tools to identify such malware with a high degree of accuracy.

The approach that they take with Zozzle is a multi-stage one.

"Once we have labeled JavaScript contexts, we need to extract features from them that are predictive of malicious or benign intent. For ZOZZLE, we create features based on the hierarchical structure of the JavaScript abstract syntax tree (AST). Specifically, a feature consists of two parts: a context in which it appears (such as a loop, conditional, try/catch block, etc.) and the text (or some substring) of the AST node," the paper says. "For a given JavaScript context, we only track whether a feature appears or not, and not the number of occurrences. To efficiently extract features from the AST, we traverse the tree from the root, pushing AST contexts onto a stack as we descend and popping them as we ascend."

The new tool is still in the research phase and it's not clear when or if Microsoft Research might release Zozzle. But the researchers say that Zozzle has an extremely low overhead when deployed in a browser--on the order of 2-5 milliseconds per JavaScript file--and has a false-positive rate of less than one percent. 

"Much of the novelty of ZOZZLE comes from its hooking into the JavaScript engine of a browser to get the final, expanded version of JavaScript code to address the issue of deobfuscation. Compared to other classifier-based tools, ZOZZLE uses contextual information available in the program Abstract Syntax Tree (AST) to perform fast, scalable, yet precise malware detection," the researchers write in the paper. "We see tools like ZOZZLE deployed both in the browser to provide “first response” for users affected by JavaScript malware and used for offline dynamic crawling, to contribute to the creation and maintenance of various blacklists."