Tuesday, May 10, 2016

The common thread: Fuzzing, Bug Triage, and Attacker Automation


People think of conferences as singular events, but INFILTRATE, which is Immunity's open access offensive information security conference, is a chain of research. The goal of a talk is to help illuminate the room we're in, not just scatter dots of light around the night sky.

I'll start the discussion of this chain by looking at some of the technical work presented at INFILTRATE that I think highlights the continuity of our efforts as an offensive community, and then end with the policy implications, as befits this blog. 

Finding Bugs in OS X using AFL

The first talk I want to talk about is a very entertaining introduction to fuzzing that Ben Nagy did last year. You can watch the video here. You SHOULD watch the video. It's great. And one thing you take away from multiple viewings of it (I've seen this talk three times now) is that fuzzing is now all about file formats, not programs or protocols. People spend all their time treating the PDF format to the pain it rightfully deserves. If we want to improve security as a whole, we might be able to do so by simplifying what we ask of our file formats, instead of, say, banning hacking tools.

Finding Bugs in PHP (and other interpreters, such as say, Javascript)

In 2014 Sean gave a talk at INFILTRATE about his work looking into fuzzing language interpreters using their own regression tests to feed them as input. This is necessary because fuzzing languages has gotten medium-hard now. You're not going to find simple bugs by sending long strings of A's into the third parameter of some random API. First of all: FINDING the available API's reachable by an interpreted program is hard. Then finding what complex structures and sets of structures and values need to be sent into API's to make them parse it, is itself hard.

Then you have to find ways to mutate the code you generate without completely breaking the syntax of the files you are creating, run that code through the interpreter, and trap resulting memory access exceptions.

Using Fuzzed Bugs for Actual Exploitation

The first step: Automatically generating hacker-friendly reports from crash data. Because as Sean Heelan says "The result of succeeding at fuzzing is more pain, not more happiness." This is something Ben Nagy has pointed out over the years as well. "Anyone want 500000 WinWord.exe crashes with no symbols or seeming utility of any kind?"

That brings us to 2016 and Sean's INFILTRATE talk focused on "What do you do with all those crashes?" Everyone is going to say "You triage them!" But you don't do that by hand. Bugs become "bug primitives" which get you some entry into the "weird machine programming language" and some are more powerful and some are less powerful but you don't want to look at the source code and try to derive what the bug really is by hand every time.

The point here, is that for some reason people (especially in policy, which this blog is usually aimed at) believe bugs occur on "one line of code" but more realistically bugs derive from a chain of coding mistakes that culminate in some sort of write4 climax.

What would a sample output from this report look like? It would look like any professional hacking team's Phrack article! Usually these look like a section of code where certain lines are annoted as [1], [2], and so forth. For example, let's look at this classic paper from MaXX which introduced heap overflows to the public discussion.
This is the hacker-traditional way to do footnotes on code which are then used to describe a bug.
Ok, so Sean wants to take all the crashes he is finding in PHP and have the computer annotate what the bugs really are. But in order to even think about how to automate that, you need a corpus of bugs which you have hand annotated so you can judge how well your automation is working, along with sample inputs you would get when a fuzzer runs against them. I'm not sure how academics (Sean is soon to be a PhD student but will never be academic) try to do research without something like this. In fact, in the dry run Sean went on and on about it. 

Ok, so once he has that a bunch more work happens and the algorithm below actually works and produces, from arbitrary PHP fuzz data + a decent program instrumentation library, sample triage reports that match what a hacker wants to see from crashes. 

Right now Sean is manually inserting variable tracing into targeted functions because to do that automatically he has to write an CLANG plugin, which he says is "easy, no problem!" but I think is reasonably complex. Nobody asked him "WHY NOT JUST SOLVE FOR ALL OF THE PROGRAM STATE USING SMT SOLVERS?" (Example for Windows ANI). Probably because that question is mildly stupid, but also because input crafting is one of those things academics love to talk about how they do very well, but when you make the problem "A real C program" lots of excuses start coming out, such as loop handling.

Loop handling is a problem with Sean's method too, btw. Just not as BIG a problem.

Which brings us to DARPA and the DARPA Grand Challenge which I've been following with some interest, as have many of you. One of the competitors gave a talk at INFILTRATE which I was really excited to hear! 

Automated BugFinding is Important but Still Mostly Fuzzing

To put this talk in context feel free to go read the blog from the winning team, which was led by David Brumley.

"We combined massive hubris with an SMT Solver, and a fuzzer, to win the cyber grand challenge!"

Artem's team came in second when it comes to what we actually care about - finding new bugs.

 And they did it in, and I think this is something interesting by itself, a similar way to the other teams, including "For All Secure", as detailed in his next slide:
Because DARPA has released their test corpus (!!!!) you can of course read the Driller paper, which does the same methodology (although not as well as Boosted), using the open source American Fuzzy Lop and a solver to find bugs.

A question for all of us in the field would be "based on this research, would you invest in improving your fuzzer, or your symbolic execution engine, or in how you transfer information between them." I'm betting the slight difference between ForAllSecure and Driller/Boosted is that they probably transfer information in a more granular form than just "inputs". Driller may even be overusing their Solver - it's still unknown how much benefit you get from it over just finding constants within a binary to use as part of the fuzz corpus. In other words, all that research into symbolic execution might be replaceable by using the common Unix utility "strings". See the slide below for an example of a common issue and imagine in your head how it could be solved.

But the slight difference in results may not be important in the long run. Finding 69 bugs (Boosted) or 77 bugs (top scorer) or with just the fuzzer: 61 bugs. The interesting thing here is that the fuzzer is still doing MOST OF THE HEAVY LIFTING. 

In Boosted they use their own fuzzer, which seems pretty awesome. But sometimes, as in the next talk I want to look at, the fuzzer is just used to generate information for the analysis engine to look at.

What DARPA Scoring Means

Having worked with DARPA myself as part of the cyber fast track effort, I know that any hint of "offensive" research is treated like you were building a demon summoning amulet complete with blood sacrifice. And yet, the actual research everyone is most interested in is always offensive research. DARPA is the ultimate font of "parallel construction" in cyber.

Cyber Grand Challenge is the same. Making the teams "patch" the bugs is not part of the scoring because for some reason we envision a future where bugs get automatically patched by automated systems. It's there because being able to patch a bug in a way that is not performance limiting demonstrates an automated "understanding" of the vulnerability, which is important for more advanced offensive use of the bug, such as will be in theory demonstrated at DefCon when the teams compete there for the final - one I predict humans will destroy them at.   

Finding bugs in HyperVisors

Let's move on to another use of automation to find bugs. These slides cover the whole talk but I'll summarize quickly.

I wish my company had done this work because it's great. :(

The basic theory is this: If you spend six months implementing a hypervisor you can detect time of check to time of use bugs by fuzzing pseudo-drivers in Xen/Hyper-V/VMware to force memory access patterns which you can log using some highly performant system, and you'll get a bunch of really cool hypervisor escapes. Future work includes: better fuzzing of system calls, more bugs, better exploits. This is one of those few areas where KASLR makes a big difference sometimes, but that's very bug dependent.

People care about hypervisor escapes a LOT. People wish they would stop being a thing. But they're never going away and it's just weird to think you can have a secure hypervisor and then build systems on top of it that are truly isolated from each other. It's like people want to live in a fantasy world where at least ONE thing is solidly built, and they've chosen the hypervisor as that one thing.

In Conclusion: Fuzzers are Mightier than the Sword

I'm mighty.
Program analysis techniques to find bugs have washed over the community in waves. From Fortify-like signatures on bad API's, to program slicing, to SMT solving, etc. But the one standout has always been: Send bad data to the program and see where it crashes. And from the human side it has always been, "Read the assembly and see what this thing does."

What if that never changes? What if forever, sending the equivalent of long strings of A's and humans reading code are the standard, and everything else is so pale an imitation as to be basically transparent? What does that say for our policy going forwards? What would it say if we refuse to acknowledge that as a possible reality?

Policy needs to take these sorts of strategic questions into account, which is why I posted this on the blog, but we also need to note that all research into these areas will stop if Wassenaar's "Intrusion Software" regulation is widely implemented even in a very light-handed way. I wanted to demonstrate the pace of innovation and our hopes for finding ground truth answers to big questions - all of it offensive work but crucial to our understanding of the software security "universe" that we live in.

Monday, May 9, 2016

A Civil Libertarian Argument against Wassenaar

I'm writing this just for one person who happened to give a quick talk at the last Commerce Dept ISTAC meeting, but it is applicable perhaps to other people, such as the German Government.

First of all, there is an element of cognitive dissonance in that some civil libertarians would agree that restrictions on cryptography are technically undoable and counterproductive and then turn around and say that restrictions on offensive information technology (aka "Intrusion software" as Wassenaar would call it), must be done to protect the "poor journalists and dissidents in Egypt!"

But what the Government would have to do to regulate Penetration Testing Software is the exact same thing they would have to do to regulate encryption! All of your programs would have to go before the NSA who would get to choose who you gave them to. From a civil libertarian perspective, how about we don't make the NSA the Rabbi in charge of choosing which programs are kosher and not? The technical argument against that has been well documented as well - no clear definition of kosher information can be written.

And of course the argument may be that you can draw a clear line between "really offensive technology" and "harmless penetration testing" technology, but this is harder technically to do than building a scalable key escrow system for all of cryptography, as the Governments would prefer to have to solve the "going dark" problem.

Not only are programs controlled under the proposed regulations, but any discussion of offensive information technology. You would limit all discussion of hacking to behind classified walls, changing the balance of power far in favor of the very governments you fear. And by having governments regulate this community with criminal law, you get a chilling effect, you get differential treatment to people who agree to backdoor software, you get all the things that a government can do with coercion when the law is so vague that everyone is in legal jeopardy.

In other words, despite wanting to protect dissidents, the civil rights argument that we should therefor regulate penetration testing technology and all related work runs into the exact same issues that the cryptowars have already fought! We would be back to printing Perl code on shirts!

And the cat is already out of the bag. Export control is a hammer you use when you have some ability to limit the spread of information. But the Internet age makes that a thousand times harder for everything, and for this sector almost all work is done globally already. It would be the wrong tool for the job even if everything else was right about it.

So in any case, a better solution to protecting dissidents would involve perhaps two other things:

  1. Pressuring governments to not assassinate and imprison journalists, especially governments we give billions of dollars to using traditional State Department means.
  2. Teaching Dissidents Operational Security. Keep in mind none of what Gamma Group and Hacking Team was selling would have worked against a non-Jailbroken iPhone. That said, nothing is going to protect you from your local government if they want you bad enough. 
I tried to keep this argument concise and from a civil liberties perspective. If you want another take, I'd recommend Thomas Dullian's more EU-centric approach. And of course, if you want to discuss this at length, you can subscribe to the DailyDave email list, which is often used for such things.

Friday, May 6, 2016

Talking about 0days and Attacks from weird Datasets

The below paper uses Symantec's WINE dataset to draw conclusions about the prevelance of 0days. It is bad in many ways, but in particular it confuses binaries with 0day (which are more related to vulnerabilities), uses a simplistic "windows of vulnerability" model, and uses the WINE dataset to try to derive real data from. Yet people quote from this paper in policy meetings as if it made sense!

A brief word about the WINE dataset and datasets like it: It is impossible to remove massive observer bias from them. All I want you to do is read the above paper and ask yourself "If the most used 0day on the market was in Symantec's endpoint protection, what would this paper look like?" A good rule of thumb is that if someone is talking about "Windows of vulnerability" they have oversimplified the problem beyond recognition.

What you get with people who rely on IDS data to talk about 0days is a bizarre level of cognitive dissonance when it comes down to how bad their data is for the conclusions they are trying to draw. The only valid thing you can say from that kind of data is "sometimes we get lucky and find an 0day". And the same thing is true when looking at the Verizon data to try to understand attacks. Their conclusions this year are demonstrably nonsensical, but every year has been the same basic methodology...

This is a must read: http://blog.trailofbits.com/2016/05/05/the-dbirs-forest-of-exploit-signatures/

I am sad that research is hard but please stop saying you understand attacks from data that makes no sense.

Monday, May 2, 2016

Strategic Surprise in Cyber

The Past Strategic Surprise

In 2011 Nicolas Waisman, Immunity's VP of South America and a highly rated exploit writer and bug finder himself, gave a keynote which examined a lot of the areas around vulnerabilities, exploits, and how to use them strategically.

This video is available here: https://vimeo.com/163699561 . I'm going to steal pictures from his talk for this blogpost but you can watch it to get the full experience. :)

Let me sum it up for you though with one question he asks which I think is a visceral experience very different from what you read in most marketing documents. He says at one point in his talk: "When was the last time you saw a real exploit?"

In order to understand that question, you have to know a bit about Nico, and his quality assurance process on "real exploits". A real exploit to Nico is essentially going to be a remote, unauthenticated heap overflow that works every time even over bad links, and cleans up after itself properly to continue execution of the remote process.

I want to put this into context so you don't think Nico is just some hacker elitist, which is probably also true, but there are not many people in the world who have led six month long multi-person projects to exploit a single vulnerability. Most exploits you know about are the result of a talented person putting in a month or less of work. Client-side exploits are a key sweet spot here because you typically have so much control over the environment as an attacker and they are easy to deploy as a commodity.

This, frankly, is where most people are in the public area. Frustration kills a lot of exploitation efforts, or reduces them to academic exercises. 

But if you look at the publicly reported information to strategically examine what exploitation is, then you may as well believe everything that you can know strategically from math you can learn in your high school calculus class.

Click this picture to see HDM and JDUCK complain about how hard real exploitation is.

Exploits are hard. They were always hard and the reason you often see vulnerability "collision" in the public arena is that people are focusing on the extremely low hanging fruit as a group. But there IS a high end in the strategic area and not seeing that high end is a massive strategic hole in your thinking!

That was the point of Nico's talk. Because INFILTRATE Keynotes are about strategic vision, and Nico shared his view with all of us, much as Nate Fick shared his this year and a future blog post will analyze that as well. :)

The Future Strategic Surprise

And I'm going to follow that on with where you are going to be strategically surprised in the future, and a quick link to this year's INFILTRATE talks. The answer is simple: Man in the Middle.

None of the protocols in wide use were really hardened against MITM. It may not even be possible in many cases. And yet, QUANTUM is an example of what strategically deployed MITM can do. And inside your network, with future generations of INNUENDO, you're going to see the destruction of entire ecosystems of protocols. For example, people underestimated the impact of using deserialization functions in Java middleware and there is far too much of that to rip out or fix now.

In short, everyone laughed at Badlock. But if you have any real strategic vision of the future as a defender, you found no mirth in it at all. In other words, MITM is a hugely unexplored surface which is practically reachable even without nation-state positioning, we should all care about it as much as we care about heap overflows, which is a lot. :)

Offensive strategy is longterm, goal oriented, and forward thinking. And while that sounds like a management bullshit bingo round, forward thinking actually implies practical placement of offensive capability today, for tomorrow's landscape. If you hacked 15 years ago, and you did your job, the only vulns you use today are your own. That is strategy.