Saturday, January 31, 2009

Google SafeBrowsing Goes Haywire

[Update: Google has fessed up in a blog post. Apparently the problem was caused when a URL of '/' was added to the StopBadware block list. The incident also caused a DoS attack on the StopBadware site due to the volume of users clicking on links in the warning messages.]

This morning, some time after 9am EST, Google's StopBadware feature went haywire. StopBadware is a Google initiative to maintain blacklists of malicious and phishing URLs. It can be leveraged by third parties via the Google SafeBrowsing API. As can be seen in the associated screenshot, the functionality was broken as a search for 'test' indicated that every search result could contain malicious content with a link stating 'This site may harm your computer'. Clicking on any search results would take the user to a warning page as opposed to the actual destination.

The issue has now been resolved, but it illustrated two things for me - the power of Google and the potential of user driven reporting sites like Twitter. We trust Google. It's been our friend for a long time, so when it starts telling us that every site is malicious we get worried. A Twitter search for posts related to the problem shows plenty of confusion and speculation as to what the problem was.
More interestingly, it provided timely information based on the start and stop time of the bug. Posts that I reviewed show that the issue began just after 9am EST and was resolved approximately an hour later. Others, suggest that problem was not fixed but rather code was rolled back to restore a stable state. Was it a regression bug introduced during an upgrade? Seems likely.

Thankfully the crisis has been averted so the world can keep turning on its axis. Nothing to see here, move along.

- michael

Friday, January 30, 2009

UTF-8 Characters to Watch Out For

Let's say your target application makes a decision using some string you control, and you'd like it to do something more interesting than the developer meant it to do. You know the application processes UTF-8, and you know UTF-8 decoders are notorious for implementation-specific behavior. Now what?

The first thing you need is the Unicode code point that corresponds to the character you want to use. It consists of 4-6 hex digits and usually looks like U+HHHH, or maybe U+HHHHHH. Your mission is to create a UTF-8-like string of bytes that, when decoded, will result in this code point.

In theory, there is only one such string of bits, which you would create using the algorithm in RFC 3629 (or more likely, by using a Unicode library). To borrow from the RFC, this string of bytes looks like this:

Character number range (hexadecimal)

Significant Bits

UTF-8 octet sequence (binary)

0000 0000-0000 007F

up to 7


0000 0080-0000 07FF


110xxxxx 10xxxxxx

0000 0800-0000 FFFF


1110xxxx 10xxxxxx 10xxxxxx

0001 0000-0010 FFFF


11110xxx 10xxxxxx 10xxxxxx 10xxxxxx

Basically, the number of leading 1s at the beginning of the first byte indicates how many total bytes are used to represent this character, and the bits in the character are filled into the positions marked x, from least to most significant.

In practice, several other representations might also work, depending on the exact decoding algorithm your target uses:
  1. The alternative 1-byte representation.
  2. Overlong (but otherwise legal) representations.
  3. Really overlong representations.
  4. Alternative succeeding byte representations.

Alternative 1-Byte Representation

The alternative 1-byte representation looks like this:

Character number range (hexadecimal)

Significant Bits

UTF-8 octet sequence (binary)

0000 0000-0000 003F

up to 6


This representation is illegal; in legal UTF-8, such a byte (beginning with the bits 10) can only appear as the second or following byte in a multi-byte sequence. But maybe the decoder assumes a new character begins here, and then counts the number of leading 1s (1) to decide how many total bytes this character takes. This leaves you with only 6 bits of actual data, so it will only work on code points U+0000-U+003F. This does not include the alphabet, but does include interesting tidbits like NUL, CR, LF, <, >, and several forms of quotes.

Overlong Representations

To get overlong representations, ignore the character number range column in the original conversion table and use all the octet sequences that have enough least significant bits to hold the character you want:

Character number range (hexadecimal)

Significant Bits

UTF-8 octet sequence (binary)

0000 0000-0000 007F

up to 7


110xxxxx 10xxxxxx

1110xxxx 10xxxxxx 10xxxxxx

11110xxx 10xxxxxx 10xxxxxx 10xxxxxx

0000 0080-0000 07FF


110xxxxx 10xxxxxx

1110xxxx 10xxxxxx 10xxxxxx

11110xxx 10xxxxxx 10xxxxxx 10xxxxxx

0000 0800-0000 FFFF


1110xxxx 10xxxxxx 10xxxxxx

11110xxx 10xxxxxx 10xxxxxx 10xxxxxx

These representations would be legal except that UTF-8 specifically disallows overlong representations. But maybe the decoder is mindlessly copying the significant bits, not examining the bits for pesky details like legality.

Really Overlong Representations

Why stop at 4 bytes? Continue the pattern with more bytes:

Character number range (hexadecimal)

Significant Bits

UTF-8 octet sequence (binary)

0000 0000-
0010 FFFF

up to 21

111110nn 10nnnxxx 10xxxxxx 10xxxxxx 10xxxxxx

1111110n 10nnnnnn 10nnnxxx 10xxxxxx 10xxxxxx 10xxxxxx

11111110 10nnnnnn 10nnnnnn 10nnnxxx 10xxxxxx 10xxxxxx 10xxxxxx


These representations are illegal; legal UTF-8 is at most 4 bytes long and contains at most 21 bits of data. But maybe the decoder will mindlessly decode one byte for each leading 1 in the first byte and perhaps even write those extra ns back over preceding bits in the result string, or whatever else happens to precede this character in memory.

Alternative Succeeding Byte Representations

To get an alternative succeeding byte representation, just replace the first 2 bits of the second & following bytes (which are supposed to be 10):

Character number range (hexadecimal)

Significant Bits

UTF-8 octet sequence (binary)

0000 0080-0000 07FF


110xxxxx 00xxxxxx

110xxxxx 01xxxxxx

110xxxxx 11xxxxxx

0000 0800-0000 FFFF


1110xxxx 00xxxxxx 00xxxxxx

1110xxxx 00xxxxxx 01xxxxxx

1110xxxx 00xxxxxx 11xxxxxx

1110xxxx 01xxxxxx 00xxxxxx

1110xxxx 01xxxxxx 01xxxxxx

1110xxxx 01xxxxxx 11xxxxxx

1110xxxx 11xxxxxx 00xxxxxx

1110xxxx 11xxxxxx 01xxxxxx

1110xxxx 11xxxxxx 11xxxxxx

0001 0000-0010 FFFF


11110xxx 00xxxxxx 00xxxxxx 00xxxxxx

11110xxx 00xxxxxx 01xxxxxx 00xxxxxx

11110xxx 00xxxxxx 11xxxxxx 00xxxxxx

11110xxx 01xxxxxx 00xxxxxx 00xxxxxx

11110xxx 01xxxxxx 01xxxxxx 00xxxxxx

11110xxx 01xxxxxx 11xxxxxx 00xxxxxx

11110xxx 11xxxxxx 00xxxxxx 00xxxxxx

11110xxx 11xxxxxx 01xxxxxx 00xxxxxx

11110xxx 11xxxxxx 11xxxxxx 00xxxxxx

11110xxx 00xxxxxx 00xxxxxx 01xxxxxx

11110xxx 00xxxxxx 01xxxxxx 01xxxxxx

11110xxx 00xxxxxx 11xxxxxx 01xxxxxx

11110xxx 01xxxxxx 00xxxxxx 01xxxxxx

11110xxx 01xxxxxx 01xxxxxx 01xxxxxx

11110xxx 01xxxxxx 11xxxxxx 01xxxxxx

11110xxx 11xxxxxx 00xxxxxx 01xxxxxx

11110xxx 11xxxxxx 01xxxxxx 01xxxxxx

11110xxx 11xxxxxx 11xxxxxx 01xxxxxx

11110xxx 00xxxxxx 00xxxxxx 11xxxxxx

11110xxx 00xxxxxx 01xxxxxx 11xxxxxx

11110xxx 00xxxxxx 11xxxxxx 11xxxxxx

11110xxx 01xxxxxx 00xxxxxx 11xxxxxx

11110xxx 01xxxxxx 01xxxxxx 11xxxxxx

11110xxx 01xxxxxx 11xxxxxx 11xxxxxx

11110xxx 11xxxxxx 00xxxxxx 11xxxxxx

11110xxx 11xxxxxx 01xxxxxx 11xxxxxx

11110xxx 11xxxxxx 11xxxxxx 11xxxxxx

As you can see, there are a lot of combinations (and of course you'll want to try this technique on the overlong representations as well). These representations are illegal; every succeeding byte in a multi-byte representation must start with the bits 10. But maybe the decoder knows where a character starts and just unpacks the significant bits from each succeeding byte without checking the first 2 bits.

Have fun identifying applications that don't follow the character encoding rules of thumb, and don't get into trouble!


(If you're looking for more do-it-yourself entertainment along these lines, you might want to check out ligatures, surrogate pairs and astral planes, try representing code points that are out-of-range for UTF-8 using really overlong representations, or figure out how to design UTF-8 strings that decode differently depending whether the decoder is moving backwards or forwards in the string.)

Thursday, January 29, 2009

What did you do on Data Privacy Day?

January 28th is officially international Data Privacy Day. Apparently this was established in 2008 to raise awareness for data privacy issues, provide education (particularly to teenagers) regarding data privacy concerns, etc. Did you even know about it? Don't worry, neither did I. Maybe we need an awareness campaign to educate people that there is an awareness campaign to educate people (heh). Intel has some various online links and material related to the happenings of Data Privacy Day.

Anyways, while educating users to not hand out their personal details is a good thing, the bulk of concerning data privacy breaches have been largely caused by large corporations mishandling user data. Whether it's
AOL, Choicepoint, or Heartland, educating a user to keep their data private is irrelevant if a 'trusted' third-party data keeper is just going to expose it on their behalf. Thus I'm not sure why we are trumpeting to solve privacy at the end user level with new 'privacy enhancing technologies' (PETs) when bigger data privacy and exposure problems exist upstream. P3P headers, anonymizers, and cookie removers are not going to affect things like the Veteran Affairs leak from happening. Even if I'm tight-lipped about my personal details, my service provider might not be. Or the person my service provider outsources to might not be. Or the person that person outsources to might not be. You get the point.

I'm sure there are some in the crowd that are thinking "PCI will help with data privacy exposure issues in third-parties." Well, kind of. Just look at
Heartland--they were PCI-compliant. Which brings to an interesting point: compliance != security. PCI can potentially ferret out gross negligence, but catching all the low-hanging fruit doesn't prevent someone from going a little higher up the tree. Especially if they are hungry.

Until next time,
- Jeff

Wednesday, January 28, 2009

Stepping Through JavaScript Obfuscation

I've been noticing an increasing number of attacks whereby malicious IFRAMEs are injected into the pages of otherwise reputable sites. While such activity is often very obvious as you can spot an IFRAME right before the closing tag, attackers with a little more skill generally attempt to obfuscate the injected code. Depending upon the level of creativity, JavaScript obfuscation may barely be noticeable as code at all and researchers often shy away from de-obfuscating it. The reality is that JavaScript runs within your browser, so you can always figure out what it's doing, it just takes a little extra effort. Now you can generally streamline the process by employing tools such as a JavaScript debugger or by inspecting web traffic, but I'd like to take the old fashioned approach of understanding the code to illustrate that obfuscation is generally not as challenging as it may first appear.

When reviewing our weblogs recently I came across a perfect example of the aforementioned scenario. The Virginia Jyothi Institute of Management website appears to be a perfectly reputable site...except for an interesting snippet of JavaScript at the bottom of the page.

Obfuscated JavaScript:

function C7D36720260A79BEECF3B8D6D(C78D9ED077610F5E11)
{function E69961B4A47426004A21A064DA3(){return 16;}
function DB47FCE800845F2179C(D89D6EB726D3262DEA5){function
CF7A2398A7A3B02EEF51A624DC28F2(){return 2;}var B0A173316D010072="";

Ouch! That's ugly. At first glance it looks like a jumble of random numbers with a sprinkling of English. However, upon further inspection you can clearly see JavaScript functions (e.g. String.fromCharCode()) and basic syntax, such as a for loop. In reality, it's not really ugly at all, but like a freshman dorm room, it does need a little cleanup. Let's do three things:

1.) Replace function/variable names - You'll notice that the seemingly random numbers are strings that show up more than once. That's because they're not random at all, they're just function and variable names that aren't particularly easy on the eyes. Start by replacing these painful names with simple letters.

2.) Add white space - obfuscation generally involves removing white space and cramming everything together, so add it back. Follow they JavaScript syntax but expand it to a common format that you're comfortable with.

3.) Rearrange - So long as you don't change the logic flow, move things around, once again, to an arrangement that is more comfortable.

Once we make these changes, we have much more manageable code. As they say, let's put some lipstick on that pig...

The same function cleaned up:


function A(B)
  function C()
    return 16;

function D(F)
  function G()
    return 2;
  var H="";

Wow, what a huge improvement. Now that's something that I can deal with. It still may seem a bit verbose for what it's trying to accomplish, but let's walk through it one step at a time.
  1. We start of by sending a large string of data to D().
  2. Within D(), let's look at the for loop. It's iterating over the data two characters at a time (G() returns 2).
  3. Each pair of characters are sent to A() where they are converted from base 16, which tells us that they're hexadecimal values.
  4. Once converted, the integers are then fed to String.fromCharCode(), which is essentially an automated ASCII table, converting decimal values to letters.
  5. The entire string is iterated over until each hexadecimal pair is converted to a letter.
When done, the converted string is revealed:

<iframe src="" style="visibility: hidden; position: absolute;" height="1" width="1"></iframe>

Now there are many easier ways to come to the same conclusion. You could walk through the JavasScript using a debugger or in this case, once you realized that the seemingly random string was likely a hexadecimal encoding, just run it through an ASCII to Hex converter. However, hopefully this walk through has shed light on the fact that JavaScript obfuscation, while one more step in the process, isn't generally much of a hurdle to overcome.

- michael

Friday, January 23, 2009

EV-SSL's reported successes, maybe

In a previous post I included a blurb mentioning that Verisign reported 7000 EV-SSL certificates were issued in the past two years, which seems fairly low. Recently I was referred to a spot on Verisign’s website where they keep their SSL case studies. Many of their more recently posted studies indicate the positive benefit/increased conversions brought about by using EV-SSL certificates. In other words, switching from standard SSL to EV-SSL certs lead to more sales, all because of customer confidence in seeing the 'green bar'.

For some reason, a mental warning alarm was going off in the back of my head. So I decided to dig deeper into the case studies to see how they actually tested things. After all, a flawed test methodology results in questionable results. I'm a bit of
a stickler when it comes to having a solid, repeatable methodology for testing stuff.

In an ideal world, these case studies would have been conducted by subjecting half of their web clients to a standard SSL certificate, and the other half to an EV-SSL certificate. Easier said than done, but I suppose it's possible if you had a web server farm where half of the servers had the EV-SSL cert loaded and the other half didn't, and you used a network load balancer in front of everything to persistently keep the same client/source IP address connection mapped to the same server in the farm. You couldn't use an application level load balancer because that would require terminating the SSL to see the application-layer data, and you wouldn't know to terminate with EV-SSL vs. standard SSL. A completely wrong methodology would be see how many sales you got using the standard SSL certificate, then see how many you got after you upgraded to EV-SSL, and attribute the change to just the EV-SSL certificate. There are too many other factors that could affect a change in sales, such as a slow-down in the economy, a wave of released stimulus checks, a newly-launched advertising campaign, etc.

So how did they test? In the
few case studies I read, they simply separated traffic into EV-SSL capable browsers (Internet Explorer 7, Firefox 3) vs. non-EV-SSL capable browsers (Internet Explorer 6, Firefox 2). The idea is that the non-EV-SSL browsers were not showing the green bar, while the EV-SSL browsers were. Then they measured the conversion rate of these two groups, and of course, the EV-SSL browser group had higher conversions. So it must have been the green bar. Or was it?

The core problem with this approach is that the population representation is skewed. The methodology assumes that users of IE7 are no different than users of IE6, which is not the case. The web development community is already
largely aware that a significant portion of the remaining IE6 hold-outs (i.e. those who have not upgraded to IE7) are actually corporate users that are running on systems that are not allowed to be upgraded. Think about it: a notable number of home users will leverage Window's standard automatic update feature, and that would have updated them to IE7 long ago. Corporate environments, on the other hand, control the updates and look to have consistent version deployments across desktops. Further, they have a significant investment in web applications tailored to work with IE6 (which has been out for 7 years that's 7 years of web apps that were made just for it); if the choice is to update all internal web apps to work with IE7, or just keep the corporate desktops at IE6, well, the fiscal choice is obvious.

Thus let's frame this scenario a bit differently: given visitors to a non-business-centric retail shopping web site, who is more likely to buy something...people shopping from their home computer, or people shopping from their work computer? Would you normally be tempted to buy a new pair of shoes online using your corporate workstation on your lunch break? Would you perhaps search a little to figure out what you wanted (which seems like an innocent-enough use of the corporate network), and then finish the transaction (i.e. officially make the purchase) later that evening when you got home?

Therefore my biggest complaint about the EV-SSL testing methodology that these case studies use is that the non-EV-SSL browser group (particularly IE6 users) will statistically have more coming-from-their-corporate-workstation users in it, and I'm not convinced that such users are identical to other users in regards to their immediate willingness to make personal retail or pharmaceutical purchases. Personally, I would expect those corporate users to be less-likely to lead to an immediate retail conversion. And that's what the EV-SSL case studies all seem to say/support...but they attribute that phenomenon to the presence or absence of the EV-SSL certificate.

So does that mean EV-SSL is responsible for the higher conversion rate? Not necessarily. Does it mean it's not responsible? Well, not necessarily. To know for sure, all other variables must be approximately equal...and in these situations, there are too many differing factors between the two groups to know in particular which factor is having the most impact. To help put things in perspective, it would be nice to see the same metrics (i.e. conversion rates for IE7 vs IE6 users) reported for sizable retail sites that are not using EV-SSL certificates. I have a hunch that, even without EV-SSL, those sites will still see slightly higher conversion rates for IE7 users compared to IE6 users.

Until next time,
- Jeff

Thursday, January 22, 2009

Character Encoding Rules of %54%68%75%6D%62

Most software engineers are aware that there are some Security Issues that have something to do with character encoding (can you see the ceremonial waving of hands?). In a nutshell, the issues stem from the fact that people want software to make decisions based on strings of characters (e.g. when the password matches, let the user log on; allow anonymous FTP when the path to the requested file is in one of these directories; disallow these HTML tags in blog posts), but computers think in bits. When there is a single, well-defined bi-directional mapping (aka encoding in the character set sense) between characters and their bit representations, there's no problem. But when there are multiple character sets to choose from (which is definitely true here in the Internet age), the same string of bits could mean several different strings of characters and vice versa. The same string might even be a different size (in bytes, characters, or both) in different encodings. Therefore, bits that don't match could represent characters that do, or vice versa, and unless the programmer took precautions, the software is going to make the wrong decision some of the time.

Let's be the good guy first. What precautions should we take?

1. As Joel Spolsky succinctly puts it, "It does not make sense to have a string without knowing what encoding it uses." Know the encoding each string uses, including endianness if applicable.
2. Ensure each string is legal in its encoding. When you convert from one encoding to another, do this twice, once before conversion (to avoid confusing the conversion algorithm) and once after (to avoid confusing the string consumer).
3. Convert strings to the same canonical encoding before comparison. Canonical may be more restrictive than legal.
4. Take encoding into account when you calculate string length or string or character sizes, use regular expressions, access the nth character in a string or otherwise manipulate strings and characters.

Sound simple? Great, because I left out this one itsy bitsy teeny weeny little detail: character escaping. Character escaping (like the percent-encoding used in URIs) is another kind of encoding which is usually used to separate the control and data channel in a mixed communication medium. Unlike the character set encodings (like UTF-8) discussed above, for which one character set per string is the limit, character escaping can nest and also cover only parts of the string. For example, we could have a HTML document encoded in UTF-8, and it could contain a URI (with the appropriate portions percent encoded) which has been base64 encoded and stashed in a hidden form field. All the same precautions apply, but to be safe (or for that matter correct), we need to rephrase precaution #1 and add a couple more.

1. Know all the encodings each string uses (or should use), including endianness if applicable, and the order in which they were (or should be) applied.
5. Change character set encoding whenever you like, but decode character escapes in the reverse order encodings were applied.
6. Extract the string to be decoded from the surrounding context before unescaping characters (lest you mix the control and data channel).

In our example, to get the original URI components back, we would need to grab the value of the form field, decode it using base64, then extract each URI component and separately percent-unescape it.

Is that all? No such luck! Unless you make a habit of talking to yourself, the strings you're manipulating are either coming from somewhere else or going somewhere else. Perhaps they are destined for a human, via some display mechanism. Our precautions to this point cover computers, which distinguish characters by comparing bits, but humans distinguish characters based on appearance. Some character sets (including all flavors of Unicode) contain more than one character with the same or very similar glyph. So, we need another precaution:

7. Display different characters using dissimilar glyphs.

On the other hand, perhaps the other party to your communication is another computer. In this situation, rule of thumb #1 effectively means you know what character set & character escaping the other party is using. Or at least what character set they say they are using. Or what character set they are supposed to be using. Or what character set it seems like they are using, based on the bits in the string and the information you expect it to contain.

This bring us neatly to our attacker (and to the end of the precautions a good guy ought to take), so next week, let's switch hats and be evil for a change.


Heartland Payment Systems Joins A Crowded Club

This week Heartland Payment Systems confessed to what may ultimately be the largest data breach ever, with some suggesting that as many as 100 million credit cards have been exposed. This makes the 2007 TJ Maxx breach of 45 million debit and credit cards look fairly reasonable. To make matters worse for Heartland, their PR department apparently felt they they could sweep the item under the rug with some shameless tactics such as announcing the breach on inauguration day and registering the domain name to discuss the matter. The latter appears to be an effort to distance the information from the Heartland corporate name and make it appear to be a dated issue.

What frightens me about such issues is not the earth shattering numbers that make the headlines for a single breach but the fact that for every Heartland, there are hundreds if not thousands of data breaches that go relatively unnoticed. In a past role, I was involved in a customer round table to discuss security, which attracted CISOs from a variety of large companies. During a discussion on data loss, talk turned to unintentional loses such as 'losing a tape off the back of the truck'. A CISO from a major financial institution shook his head and said "if only you knew how common that is". I have no doubt that he was being frighteningly honest with that statement and in a world where access to a few simple details such as my SSN could ruin me financially, that is truly a sobering thought.

The Open Security Foundation does an excellent job of trying to ensure that data breaches don't go unnoticed, by maintaining the Data Loss DB. It is a well maintained, detailed, open source collection of data and statistics regarding data loss dating back to 2000. When Heartland came forward, I decided to look back at the past year to see just how severe a problem this has become.
  • The Data Loss DB contains records for 479 incidents of data loss during 2008
  • Records exist to suggest that at least 83,350,024 accounts were affected in 2008, however, estimates of the numbers of accounts affected were not available for 26% of incidents, so the actual number is higher
  • Stolen laptops accounted for the majority of the data loss at 20% of all incidents, while hacks of some description accounted for 17% of incidents and web specific hacks were responsible for 14% of the results
  • Social Security Numbers were lost in one third of all attacks last year
Besides being thoroughly depressing, these statistics should teach us a few things. First and foremost, while the Heartlands of the world will capture the headlines, they represent only the tip of the iceberg in terms of the loses that occur every day. Moreover, data loss occurs in a multitude of ways, both through direct attacks and carelessness on the part of employees. The important thing is that companies have both preventive and detective controls in place to ensure that such incidents are stopped in the first place but also identified when they do occur. If you wait to see your company in the headlines and you're in charge of enterprise security, two things are certain - like Kato Kaelin, you'll be famous and unemployed.

- michael

[Charts courtesy of the Open Security Foundation]

Monday, January 19, 2009

SANS/CWE Top 25 Programming Errors

Recently Mitre and SANS teamed up to produce a list of the top 25 security errors that programmers should focus on. The errors were selected from the Common Weakness Enumeration (CWE) collection, which looks to enumerate all the different ways a piece of software can create a security vulnerability. For those familiar with CVE (Common Vulnerabilities and Exposures), the difference between CVE and CWE is the former tracks point instances of vulnerabilities in specific applications, while the latter tracks the different ways a vulnerability can manifest in any arbitrary application.

The list does cover the gamut of heavy-hitting vulnerability types that traditionally plague applications (SQL injection, XSS, command injection, buffer overflows, file name manipulation, sending data in the clear or with weak crypto, etc.), but nothing on it is particularly earth-shattering. I always chuckle to myself when I see entries like CWE-20 (Improper input validation) and CWE-116 (Improper output validation) as they are a bit of a catch-all and, from some viewpoints, subsume all of the more specific input/output validation errors such as SQL injection, XSS, etc. A programming shop that validated all of their incoming data would likely be able to cross multiple items off the list...but it's actually a fairly tall task. I've dealt with many application audits and security code reviews in my past, and I've never encountered a programming shop who felt it was realistic to go back and audit/adjust all inputs for validation on a moderate sized app (or larger). These large applications—many being web-based—sourced data from too many places to feasibly and accurately account for them all. Which goes along with something me and many other security professionals have always said: it is easier to design security in, rather than retrofit it after-the-fact. Having and mandating the use of a core set of available global validation functions will (hopefully) keep programmers tagging inputs with a base level of validation as the application grows and sprawls. Once a large application is written, trying to find and evaluate all inputs is like trying to find needles in a haystack. In my experience, most organizations change tactics at that point and look to just ensure they are not vulnerable to specific errors. In other words, rather than ensure all inputs are validated, they will instead just review the areas around their SQL calls to ensure there are no SQL injection issues. This essentially employs validation at 'time of use' rather than 'time of reception.' On the surface this seems like a good strategy--after all, there are likely few times of use for any given input that could result in a security vulnerability. But the long-term problem with his approach is that it creates a patchwork effect where validation is not done consistently. If input validation is done at point A (immediately up receiving), then all subsequent uses of that data (i.e. points B, C, and D) are in the clear. But if you instead employ validation at point D (i.e. time of use), then you're OK for now...until a programmer decides to branch C to call point E. At that point, E will inherit dirty data, but the programmer might not suspect it because they were mentally complacent with the idea that the data was previously clean at point D. Any direct or derived uses of the data at points A, B, C will still be susceptible to vulnerability.

But input validation aside, let's look at the list as a whole. It's meant for application developers, and addresses programming issues. These are definitely worth fixing, but at the same time, they can only go so far; in particular, this list does not address operational security issues. Here are some examples of security incidents that the list would not address:

Overall the CWE/SANS list effort is good, because everyone could definitely benefit from a higher level of security in application software. But a look at the recent high-profile security incidents shows that humans are still a very real and very weak link in the security chain. So it's important to keep the list in perspective; even if you tackle all 25 items on the list, you may still be exposed by weak operational practices, user oversights, or what would have been item 26 on the list. It is worth noting that SANS also has a broader Top 20 Security Risks list, which does encompass a lot of operational security issues.
Update: also check out the How to suck at Information Security list, also posted at SANS.

- Jeff

Sunday, January 18, 2009

Why is Conficker/Downadup Succeeding?

It has been a while since we've seen a fast spreading worm affect a significant volume of victims. This past week however a new variant of Conficker (aka Downadup) reportedly infected millions of Windows machines. So called 'big bang' worms have largely faded from the headlines not so much for technical reasons but rather for both procedural and motivational reasons. On the procedural front, we've shortened patch cycles and locked down external access to/from networks. From a motivational standpoint, attackers have decided that 'big bang' worms don't meet their needs. Attackers have real financial motivations and drawing media attention to their efforts is not generally conducive to increasing revenue. Attacks have tended to be increasingly stealthy in nature. Why then has Conficker suddenly been so successful? Not surprisingly, the answer relates to weaknesses in enterprise defenses and ingenuity on the part of the attackers.

Enterprise Defenses

Patch Management - It would appear that patch cycles aren't so foolproof after all or at least there are still adequate numbers of end users that are not patching machines in a timely fashion. Conficker uses a vulnerability (MS08-067) that was patched nearly three months ago as its primary attack vector. Now a significant potion of unpatched machines may represent home as opposed to enterprise users but if you're looking for willing zombies with broadband connections, there's no need to be picky. Some are using the success of Conficker to call for mandatory patching.

Network Shares - Should vulnerability exploitation not succeed, Conficker then looks for network shares with weak passwords. While enterprises have significantly locked down the network perimeter over the years, the LAN itself is typically wide open. End users are freely permitted to open network shares and password strength is not enforced. Companies need to realize that it takes only one infected machine to infect an entire network. When the majority of computers are laptops that leave the corporate fortress regularly, having a single infected machine is almost a given.

Attack Techniques

Multi-faceted - Conficker is a hard working worm. It attempts to exploit machines vulnerable to MS08-067, spread via network shares and even connected removable storage devices. While leveraging multiple attack vectors is not a new technique for malware writers, Conficker's authors were wise to choose paths covering a lot of ground. Rather than just hammering away at a list of known vulnerabilities which are likely all exposed or all patched, Conficker instead tries exploitation, brute force and piggybacking.

Dynamic Domain Names - Once infected, Conficker attempts to contact other attacker controlled machines in order to retrieve additional code. Rather than simply using a round robin of hard coded domain names or IP addresses, Conficker instead has the ability to contact thousands of potential domain names. Those controlling Conficker can use only specific names and ignore others. Taking down all of the domain names would be time consuming and in most cases a waste of time as they may never be used.

Is Conficker an anomaly or a sign of things to come. While I don't expect to see a resurgence of 'big bang' worms, I do expect malcode authors to learn from the successes of Conficker. In the meantime, grab the latest copy of the Microsoft Malicious Software Removal Tool to ensure that you're not infected.

- michael

Friday, January 9, 2009

IRS works to prevent identity fraud this tax season

The IRS has released six new security and privacy standards aimed at ensuring authorized IRS online e-File providers handle tax filers' data with care. This is definitely a good move, as I'd like to believe that e-File provider web sites make juicy targets for attackers; an attacker would have a smorgasbord of social security numbers, bank account numbers, financial profiles, and all other supporting information necessary to commit identity fraud.

The abbreviated summation of the new standards:

  1. Use extended validation SSL certificates (EV-SSL), with a minimum of 1024-bit RSA and 128-bit AES
  2. Weekly external vulnerability scans, following PCIDSS standards and using a US-based PCI ASV (Approved Scanning Vendor)
  3. Accessible privacy and safeguard policies available on the web site
  4. All submissions must be subject to a "challenge-response test" (i.e. CAPTCHA)
  5. The system(s) must use a registered domain name with a US-based ICANN accredited registrar, and ensure the domain name is locked against transfers and the registration information is publicly available
  6. Any discovered security incidents must be reported to the IRS within 24 hours after incident confirmation
Overall, these are nothing ground-breaking per se. But it is nice to see the IRS mandate a fair level of security requirements on some of the most sensitive data electronically transferred by consumers. Requiring weekly vulnerability scans bodes well for the US-based PCI ASVs; it reminds me of this Dilbert comic.

As an aside, I also ran across this
related press release from Verisign. Basically they talk about how the IRS has mandated the use of EV-SSL certificates. More interesting is that it states "More than 7,000 Web sites already rely on VeriSign EV SSL Certificates." It's unclear whether that's 7,000 *tax-related* web sites (doubtful), or 7,000 web sites total (more likely). I'm assuming the latter, and as such, I'm a bit surprised at how few EV-SSL certificates have been issued over the last two years (EV-SSL was announced in late 2006). It seems a lot of sites are just not buying into EV-SSL...yet?

Until next time,
- Jeff

Modern Graffiti

I've said before that attackers provide great insight into the evolving uses of technology. For the past decade or so the digital equivalent of graffiti has been website defacements. The past week has seen thousands of primarily Israeli sites defaced as a result of the current Israeli/Palestinian conflict. Defacements are the low hanging fruit of web security as it requires minimal skill to succeed. When your target isn't specific (i.e. any Israeli website), given the sorry state of web security, it is trivial to find a vulnerable victim using freely available scanning tools. For this reason, defacements tend to be the domain of script kiddies and are used for the same reason that physical graffiti is used - to get a message across, be it political, religious or just mischevious. However, as communication mediums are evolving, so too are the chosen targets for electronic graffiti, as was demonstrated this week by two very public attacks, on Twitter and the popular MacRumors live blog.

Twitter saw 33 'celebrity' accounts hacked with content added that was either meant to be mischevious, such as the breaking Fox News report on Bill O’Reilly's sexual preference, or intended to generate revenue as did several which included links to affiliate sites. 'GMZ' a member of the Digital Gangster forum was reportedly responsible for the attacks and used a simple dictionary attack on the Twitter administrator account of a user named 'Crystal'. The attacks succeeded because Twitter did not lock out successive failed login attempts - something they have now implemented.

The MacRumors hack came at a most inopportune time - during live coverage of the annual Macworld Expo keynote. At 9:24 am, an unintended post, which obviously caught the bloggers by surprise, suddenly announced that Steve Jobs had died. Shortly thereafter the entire live blog had to be shut down as the attackers eventually began flooding the blog with unwanted comments. Details of how the attack succeeded have not emerged but rumors suggest that this too resulted from guessed/stolen password credentials. Comments on the 4chan forum also suggest that members from that community were involved in the attack. That is the same forum where details of the attack on Sarah Palin's email account first emerged.

What should we learn from this? From a technology perspective, eyeballs are moving toward real-time content. Our society has long sought instant gratification and micro-blogging services such as Twitter are benefitting. What's more real-time than a quick comment from my phone letting the world know what I'm doing at every second of my exciting life that I'm sure people simply can't live without. From a security perspective it's sad to see that after decades we're still using the single factor authentication provided by passwords for sensitive accounts. To make matters worse, these passwords were obviously implemented with poor policies and perhaps even shared. These accounts deserved to be hacked and the attackers (hopefully) taught Twitter and MacRumors a needed and embarassing public lesson. Let's hope they learn.

- michael

Wednesday, January 7, 2009

Buffer Overflows Are Injection Vulnerabilities

I admit it: I have a hard time getting excited about 0-day. Defense against the concept is fascinating (trying to detect exploits for vulnerabilities that haven't been written yet, for example, is way cool). New attacks are important to respond to ASAP, and may change the current wisdom about what's safe, so security geeks (including me) need to pay attention to what's happening. It's just that the details of individual exploits of specific vulnerabilities in specific software are both soporific and depressing: after a while, all the different attacks and vulnerabilities start looking alike. How is it that we are still making the same mistakes, over and over? And how is it that we are continually astonished when the same attack techniques work on the same mistakes, over and over?

In the interests of making everyone else as bored & jaded as I am, and perhaps even helping increase common knowledge, let me start by explaining why vulnerabilities to cross-site scripting and buffer overflows are instances of the same abstract problem: injection vulnerability.

It's pretty common to categorize cross-site scripting (XSS) and, say, SQL injection or format string exploits as injection attacks. In all of these cases, Mallory puts control codes where the developer is expecting data. A system will be vulnerable to an injection attack if and only if all 3 of the following interrelated conditions hold:

a. A data and control channel are mixed in the same communications medium.
b. Untrustworthy data is added to the data channel.
c. The mechanism intended to separate data & control channels is either insufficient or (more frequently) insufficiently applied.

Here's the run-down of these conditions for XSS in HTML:
a. An HTML document contains interleaved tags and attribute labels (the control channel), and their values (the data channel).
b. In most if not all interactive Web applications, user input (always untrustworthy, whether it is taken from this request or some persistent store) is displayed in the HTML document.
c. The mechanism for separating data & control channels in HTML is encoding according to the HTML specification; HTML is well-defined and this mechanism is sufficient.

Since embedded markup is part of the fundamental nature and appeal of HTML, not to mention a long-standing standard, nobody is about to change (a) and the usual recommendations for avoiding vulnerabilities to XSS correspond to eliminating (b) and (c):
b. Validate input (i.e. ensure it is trustworthy).
c. Encode input for the appropriate context before displaying it (i.e. use the mechanism for separating the data channel from the control channel).

It's less common to think of a buffer overflow attack as an injection attack, but here's the run-down of the injection vulnerability conditions for a buffer overflow on the stack:
a. The stack contains interleaved stack frame structures (the control channel) and local variables, parameters and the like (the data channel).
b. Almost any application touches input from outside itself (untrustworthy data), and almost any application puts the input it is processing into function parameters and local variables.
c. The mechanism for separating data & control channels on the stack is size-based; the stack is well-defined and this mechanism is sufficient.

As you can see, it's a match. If any of these conditions does not hold, stack-based buffer overflows are not possible. For purposes of avoiding the vulnerability, the important thing isn't the kind of control information the attacker can insert (HTML vs. pointers and machine instructions), or the mechanism used to separate data & control channels (encoding vs. size), it's preventing the data channel from leaking into the control channel.

As one would expect, the usual recommendations for avoiding vulnerability to stack-based buffer overflows eliminate one or more of these conditions :
a. Separate data and call stacks. (Actually this is relatively uncommon recommendation, probably because calling conventions are a pretty low-level interface to be changing.)
b. Validate input, e.g. confirm that strings are actually null-terminated. (Of course, by the time it's input to a function that would want to do the validation, you frequently don't have enough information to tell whether the null terminator occurs within the bounds of the memory that was allocated for this variable, so in practice you need to decide on and check invariants at the time the data is output to the variable.)
c. Use safe functions, i.e. those that have length arguments or otherwise include length checking (i.e. use the mechanism for separating the data channel from the control channel).
b and c. Use an interpreted language. (Many interpreted languages put object IDs (frequently pointers) on the stack instead of the original data; if the object IDs are not controllable from outside, no untrustworthy data goes on the stack. Also, types in many interpreted languages do automatic bounds checking, i.e. use the mechanism for separating the data channel from the control channel.)

What about all the other stack-based buffer overflow recommendations you hear, like canaries, mirrored stacks, ASLR and non-executable stacks? They work by detecting the attack or reducing its possible consequences, not by avoiding the vulnerability to start with. So, they are worth doing, but if you don't address the conditions that make the attack possible to start with, the underlying vulnerability is still there.

Now that the conditions that enable injection attacks are clear, I hope you will see them everywhere they exist in your system, even if you are using proprietary data formats & protocols no one but you has ever heard of. It could be very instructive to make a quick table of data formats and network protocols used in each layer of your system, with columns for whether control is in-band, whether untrustworthy input appears in the data channel (remember that untrustworthy data inserted at a higher layer of the system is still untrustworthy when it is interleaved with a control channel at a lower layer), the mechanism(s) for separating the control and data channel, and whether each mechanism, if applied correctly, is sufficient.

With that, I'm relying on you to move on to bigger and better kinds of vulnerability in 2009!

-- Brenda

Tuesday, January 6, 2009

2009 Web Security Predictions

2009 is a year when some fundamental shifts in web technology will begin to take hold and attackers will adjust accordingly. With the emergence of revolutionary changes such as cloud computing, widespread adoption of next generation web application technologies and the ‘real’ web arriving on mobile devices, we must anticipate that attackers will adjust their tactics to leverage these shifts. From a corporate perspective, anticipating attack and business trends can make the difference between being prepared for an emerging threat or being blind-sided by a new attack. We have combined the wealth of knowledge shared among Zscaler researchers with our unique access to web traffic to assemble our top 10 web security predictions for the coming year.

1.) Cloud Computing Is Ready For Primetime

2008 was the year in which 'cloud computing' truly emerged on the public stage and gained acceptance as a force to be reckoned with. In the coming year the honeymoon period is over and it's time for cloud computing to prove it's worth. Now we’re obviously biased but I fully expect 2009 to be a critical year for security in the cloud. We've moved past peaked curiosity to detailed evaluations and bake-offs of the many competing solutions that are beginning to emerge. This is the year where the pretenders will quickly be relegated to the sidelines and by year-end dominant players will begin to emerge.

For this process to occur, companies are going to need to define criteria for assessing security in the cloud. It will require a different approach than traditional software assessments, as you don't get to hold and touch SaaS solutions. This in turn makes it even more important that evaluators hold SaaS vendor's feet to the fire. Just because a vendor says that a feature exists - ensure that you understand how and determine if it will meet your needs.

2.) The ‘Mobile Web’ Is No Longer A Separate Platform

Check historical security predictions and you'll see that every year for the past decade was labeled as the moment when mobile malware would explode. It made sense. After all more and more people were leveraging mobile devices so attackers needed to adjust their focus eventually...yet they didn't. The number of large-scale attacks on mobile devices has been minimal. How can this be and will 2009 be any different?

Mobile malware has largely remained on the sidelines for two reasons. First, the multitude of platforms has limited the payback from putting in the work to exploit vulnerabilities in a single implementation. Secondly, mobile applications have largely lacked the feature set of their desktop counterparts, which leads to limited use.

Thanks in part to the competitive innovation injected by Apple's iPhone, the ‘mobile web’ is no longer a separate platform. What do we mean by that? There is no longer a clear distinction between web content for mobile and traditional browsers. You don't need WAP, WML, etc. to access the web on your phone. Standard web applications are now realistically accessible despite the limited screen real estate on a mobile device.

What does this mean for mobile attacks? It means that 2009 will be a significant year, but not because attacks will suddenly shift to mobile platforms. Mobile attacks will finally become a mainstream problem because mobile devices are susceptible to traditional attacks. Browsers such as Mobile Safari, Internet Explorer Mobile, Mobile Opera, etc. now have full JavaScript engines and can generally accommodate Rich Internet Applications. This combined with the fact that they now drive a meaningful volume of Internet traffic means that they are susceptible to many of the non-browser specific attacks such as Cross-Site Scripting (XSS), Cross-Site Request Forgery (CSRF), Clickjacking, etc. They also tend to share libraries with their oft-vulnerable desktop siblings, so also expect browser specific vulnerabilities to be on the rise.

3.) Is Client Side Browser Storage a Feature or a Ticking Time Bomb?

The line between web and native desktop applications is continuing to blur. Rich Internet Application technologies such as Flash and SilverLight and the emergence of development approaches such as AJAX have made web applications much more interactive and user friendly. However, despite these advancements, a critical differentiator between desktop and web applications remains the need for connectivity. Sure, Google Docs is a great alternative to Microsoft Office - until you board a plane. You can't use web applications unless you have access to them.

This too is starting to change as browsers are gaining access to client side storage solutions. Flash storage, Google Gears and Structured Client Side Storage, detailed in the HTML 5 specification all address this issue. While this opens up new doors for web applications, as with any new technology, if it is implemented insecurely, it can increase risk and create new headaches for corporate security staff. Our early review of these technologies suggests that they are not well understood and are indeed being poorly implemented. This is turn will lead to the leakage of sensitive information and client side equivalents of XSS and SQL injection. Stay tuned for further posts on this topic.

4.) Web API Vulnerabilities Lead to Mass Attacks

Code reuse has always been encouraged - why reinvent the wheel. It's also a good security practice as open, stable code has been scrutinized by many eyeballs and is therefore more likely to be secure. In web development however, that is not always the case. While we may leverage functionality created by others, the security of that code may not have been assessed by anyone other than the original creators. That occurs because we're not necessarily dealing with open source code or even compiled binaries but rather web based APIs. In this case, we're largely counting on the providers of a given service to ensure that it's secure. Google for example makes available a plethora of APIs for everything from maps to social networks and despite investing in securing code prior to release; they have seen their fair share of vulnerabilities. When a vulnerability is discovered in such an API, hundreds and thousands of sites can instantly be affected. Fortunately, patching can generally be completed quickly as only servers hosting the API need to be updated but this can also break associated applications. From an attacker's perspective, a web API vulnerability can signal a target rich environment with a small window of opportunity - a good reason to keep quiet once a vulnerability is discovered.

5.) Abusing the Architecture Of The Web As Opposed To Vulnerabilities In Specific Applications

Traditionally, vulnerabilities have exploited a specific error on a specific platform. The web however, has opened the door for attacks which are cross-platform simply because they abuse intended functionality. Let's face it - the web was designed to be open - not secure. Moreover, the interconnected nature of the web means that a vulnerability on one node (i.e. a web server) can indirectly lead to a compromise on another node (i.e. a web browser). We could argue that these facts of life on the web account for the majority of security incidents today and that the trend will only continue to escalate. XSS is an entrenched example of this fact. The web was designed to freely share content between browsers and servers and JavaScript was developed and adopted to extend the functionality of web browsers. The inventors did not however plan for a scenario whereby someone with less than honest intentions would leverage the power of JavaScript and browser/server trust (and lack of input validation) to attack unsuspecting users. Such attacks are not specific to any browser or server - all are affected equally as XSS abuses the infrastructure of the web not a specific vulnerability. We were provided with another example of this phenomena in 2008 when Clickjacking flew into the media spotlight. This time the attack focused on transparency and DHTML with a dash of social engineering to convince victims to make requests that they hadn't intended too. Expect such attacks to become the rule as opposed to the exception.

6.) Internet Explorer Gets Competition

We've often argued that vulnerability statistics are a poor indicator of the relative security of a product. Rather than providing insight into the relative security of products, they tend to instead reveal the popularity of products. Attackers/researchers want the biggest bang for their buck. If they're going to spend time looking for vulnerabilities in a certain product, they're more likely to focus on the one with the largest user base. Internet Explorer (IE) has held the browser crown for several years now and that in our opinion is why it has tended to see a more significant number of vulnerability reports. The landscape is however beginning to change. Not only are mobile browsers starting to take off (see prediction #2) but there are some interesting new challengers in the market such as Google Chrome. While we don't expect Microsoft to fall from the pole position any time soon, we do expect the focus to shift to some of these new and intriguing entrants. Expect to see a decreased number of IE vulnerabilities in 2009 and more from mobile browsers and especially Google Chrome. While Google has a decent security track record, they haven't faced the same difficult but important learning curve climbed by Microsoft over the years. Browser development is also a different game than web application development, Google's forté, so expect some tarnish on the Chrome in '09.

7.) Data Leakage Via The Web Reaches A Tipping Point

The Internet is converging on ports 80 and 443? Why? They're always accessible for outbound connections on corporate networks. Whether you're dealing with VoIP, P2P or malicious code, network aware applications are becoming increasingly intelligent. While they may initially attempt to connect via high-level ports using proprietary protocols for efficiency, they will often try a variety of approaches and ultimately regress to tunneling content through HTTP/HTTPS traffic. Combine this with the fact that users are increasingly encouraged to share content online (Facebook, Myspace, YouTube, etc.) and it's easy to see why data leakage via web channels is fast becoming a top priority for many enterprises. 2007 and 2008 were banner year for data leakage solutions in general (Vontu and Provilla acquired) but in 2009 the focus will shift to web based DLP.

8.) The Long Overdue Death of URL Filtering

Network administrators are finally starting to give up on managing web traffic through URL filtering. URL filtering is a dated technology, which tends to focus on blocking traffic at the domain level and leaves the administrator with a binary decision - allow or don't allow access to a given resource. This approach was reasonable when the web was dominated by static, corporate content. This is no longer the case. Today, content is extremely dynamic and often user generated and this changes the rules. A page that was good today may be bad tomorrow and while a domain such as may be perfectly acceptable, individual pages could contain objectionable or malicious content. In 2009, enterprises will seek solutions that support dynamic content filtering and page level reputation.

9.) Controlling Web Application Functionality, Not Just Content

It is no longer enough to block/allow access to sites. Administrators are demanding control over functionality, not just content. There may be perfectly legitimate reasons to permit access to a given resource but not want to permit specific functionality. Take YouTube for example. While it may be seen primarily as an entertainment site, many businesses have begun to leverage it as a marketing tool. If URL filtering is your only option for controlling access to YouTube (see prediction #8), you're stuck with allowing/disallowing access to the site as a whole. You may however wish to permit viewing videos to allow for competitive intelligence (or to avoid being the fun police) but not want to permit uploading content for fear of data leakage (see prediction #7). Administrators are increasingly demanding the ability to manage the - who, what, when, where and how much of web security. They want the granularity to determine that only the Marketing Department can upload videos during work hours to YouTube, so long as they don't use more that 5% of available bandwidth. Solutions need to understand not just the destination of a web request, but also the business logic of that destination in order to be able to permit the granular level of control required to manage third party web applications.

10.) Mobile Platforms Open Up For Developers…And Attackers

Desktops and mobile devices have evolved along nearly opposite paths - until now. Desktops have always been about an open architecture - you're free to add whatever applications you like, after all, you own it. Cell phones on the other hand have traditionally been black boxes - don't like a key feature? Buy a new phone. Much of this was driven by the control crazy mobile carriers who until recently have wielded the power. They didn't want you to be able to add features, as it would then be harder for them to differentiate their 'exclusive' handset offerings. This has changed however as device manufacturers have begun calling the shots and the carriers are now tripping over one another to argue about who is the most 'open'. While this overall is a positive thing, it does have security implications. The more freedom that you provide someone with the more likely that freedom will be abused. Providers are taking different approaches to an 'open' model. Apple for example allows you to install whatever you want, so long as they approve of it. Are they however, assessing submitted applications for security weaknesses, or just undesirable functionality? We suspect it's the later. Google on the other hand is taking a more 'pure' open source approach and not locking down as many portions of the O/S for developers. Will attackers abuse these new open platforms? You bet.

Friday, January 2, 2009

Breaking Down Broken SSL

Update [01/06/09]: I discussed this topic with Dan Kaplan on his SC Magazine podcast.

On December 30, 2008 at the 25th Chaos Communication Congress (CCC) in Berlin, seven researchers presented a talk entitled 'MD5 Considered Harmful Today: Creating a Rogue CA Certificate'. In essence, they made the theoretical practical by building on recent research which detailed how chosen-prefix collisions for MD5 could be used to create two x.509 certificates with identical signatures despite having different content. The CCC presentation has taken this work a step further to actually produce an intermediate CA certificate signed by a trusted root CA. This certificate can then be used sign any website certificate. A website certificate created by the team can be seen in the image below. Note that Internet Explorer in this case considers the certificate to be legitimate.

As you can imagine, the ability to create fake SSL certificates is a phisher's dream come true. An attacker possessing such a rogue certificate could produce fake SSL certificates for phishing sites which would be indistinguishable from their legitimate counterparts. However, before we conclude that the sky is falling it's important to understand that while such attacks have now been proven to be possible, it would still be difficult to an attacker to successfully mount a successful attack.

Ingredients For a Successful Attack

1.) Knowledge - While the CCC researchers have published a detailed paper to discuss their findings, they will thankfully "for the time being not release the full details of how [they] have been able to obtain the rogue Certification Authority certificate". While their research is based on past work that is now public, they confess that the "publicly known techniques have been improved at crucial points". In short - simply reading their paper/slides will not be enough to conduct a successful attack, additional research would be required. That said, you can bet that the race is now on for the bad guys.

2.) Computing Power - In order to calculate the MD5 hash collisions to produce the rogue CA certificate, the team employed a cluster of 200 PlayStation 3 consoles. While such technology is readily available, there is a real cost associated with acquiring the necessary computing power to replicate their work.

3.) Traffic Redirection - This is the biggie. Producing a rogue SSL certificate for say is of little value if it is not hosted at Hosting the certificate at any other domain will result in a warning message from any visiting web browser indicating that the certificate does not match the domain name. The necessary traffic redirection is realistic on a LAN segment, but would be very difficult to accomplish on the Internet as a whole. Attacks such as Dan Kaminsky's DNS cache poisioning attack would provide the critical missing piece of the puzzle but fortunately this vulnerability has aged to the point where it has primarily been addressed.

Why This Was Possible

The CCC researchers largely credit the work of Marc Stevens , Arjen Lenstra and Benne de Weger on chosen-prefix collisions for MD5 which was released in 2007. However, their success was also made easier by less than perfect processes at a number of Certificate Authorities. Certificates do not have to be signed using MD5. In fact, MD5 was shown to be vulnerable to attack as early as 2004 and yet a number CAs still employ MD5 for certificate signatures. The worst offender turned out to be RapidSSL (owned by VeriSign), which was responsible for 97% of the MD5 signed certificates that the researchers looked at. The team also faced other challenges such as predicting the validity period and serial numbers of the certificates issued by the CA that was being targeted. This was made much easier by the fact that certain CAs are apparently using sequential serial numbers.

What Needs To Be Done

So, what can you do to protect your users from falling victim to phishing attacks on sites that may have rogue certificates? Unfortunately, there isn't a good answer to that question. If someone were to successfully create a rogue certificate it would be difficult evn with manual inspection to identify it as rogue. The researchers took the approach of developing an intermediate CA certificate which could then be used to sign any website certificate, therefore you could identify/block any web certificates signed by an intermediate CA which is not specifically trusted but that could also block some legitimate sites. It should be noted that it is not enough to block sites with MD5 signed certificates (unless MD5 was not used for signing at any point in the chain of trust). While MD5 collisions were used to develop the rogue CA certificate, once it is developed, MD5 does not need to be used for signing subsequent website certificates.

The fix really lies with the CAs. The fact that some have continued to use MD5 instead of a more secure alternative such as SHA-2 for signing certificates some four years after real weaknesses in the algorithm were demonstrated, is inexcusable. They also need to inject randomness into the signature generation process, most notably in the serial number field. Hopefully this research will force CAs to quickly phase out the use of MD5 for developing certificate signatures and improve their procedures overall.

Happy New Year!

- michael