kafsemo.org

“I try to think of myself as little more than a vessel for the Zeitgeist.”

Hullo!

Using HTTP caching libraries

Sunday, 6 December 2015 (#)

Efficient use of HTTP used to require a lot of custom code. However, libraries can dramatically reduce the amount of code you’re maintaining, and also allow sharing any improvements.

Efficient use of HTTP/1.1 means caching (RFC 7234), conditional requests (RFC 7232) and compressed Content-Encoding, amongst other things. For an RSS reader, I previously wrote a bulk feed downloader that took advantage of all these things in around 700 lines of perl.

A large part of that logic is included in httplib2, a great Python library that acts as an in-process caching proxy. To allow use of Python’s de facto standard Requests library, I’ve been using cachecontrol: “The httplib2 caching algorithms packaged up for use with requests.”.

Instead of:

    sess = requests.session()
    response = sess.get('http://example.com/')

write:

    sess = CacheControl(requests.session(), cache = FileCache('.web_cache'))
    response = sess.get('http://example.com/')

Hey presto; all your HTTP calls backed by a persisted cache that obeys HTTP’s rules for cache freshness. On the Java side, Apache HttpComponents does a pretty good job of the same thing.

What’s the result? Firstly; I’m down to under 400 lines (now, of Python). Secondly, and more importantly, much of my code is now in a common open source library. I can benefit from others’ fixes, and contribute my own as well.

The switch from Perl threads to Python’s concurrent.futures is also welcome.

I’ve lost some functionality from my own code. I no longer have a summary of how much bandwidth was saved due to compression at the end. However, I don’t miss it, and this kind of rewrite is a great chance to throw away behaviour that I’m not actually using.

The moral of the story is: treat self-maintained code as a liability to be reduced where possible. Unless there’s good reason, prefer de facto standard libraries, and architectures that allow small, well-defined libraries to be introduced.

(Music: Motörhead, “R.A.M.O.N.E.S.”)

jBCrypt: three ways to fix overflow

Sunday, 22 March 2015 (#)

jBCrypt is a pure Java implementation of the bcrypt key derivation function, used for password hashing. The first release (0.1) was in May 2006, and subsequent releases have fixed a number of bugs. That most recent release (0.4, in January 2015) fixed an integer overflow bug in the case of trying to use the maximum number of rounds. It’s unlikely that you would currently run into that bug, but it’s still a good idea to fix it.

As someone who has forked jBCrypt, in order to get fixes into Spring Security, here’s a little about the history of that issue and three possible ways to fix it.

The bug

It’s an integer overflow bug with the number of rounds. bcrypt’s tunable work factor is the base-2 log of the number of rounds, so a natural way to implement this is:

int log_rounds = 10;

if (log_rounds < 4 || log_rounds > 31)
    throw new IllegalArgumentException ("Bad number of rounds");

int rounds = 1 << log_rounds;

for (int i = 0; i < rounds; i++) {
    // Iterate
}

Here’s the problem; we’re checking that log_rounds is not more than 31, but what about the edge case?

System.out.println(1 << 30);
1073741824

System.out.println(1 << 31);
-2147483648

Oops; bug. 0 < -2147483648 is immediately false, the loop ends without ever running, and the result is... not secure.

The fixes

One fork is on Google Code (jbcrypt). It’s notable for getting org.mindrot:jbcrypt:0.3m published in the Central Repository.

There, this bug was filed as Issue 1: Integer overflow when log_rounds = 31 back in 2011:

I expected BCrypt.hashpw(p, BCrypt.gensalt(31)) to take a really long time but instead it returns immediately. It's because int overflows on 2^31 and the key setup loop returns immediately.

The attached patch switches the type of rounds to long:

System.out.println(1L << 30);
1073741824

System.out.println(1L << 31);
2147483648

Excellent; it stays positive, and the loop runs a couple of billion times (rather than not at all).

However, despite being reported, this patch was never applied to that fork, never made it to Central and never made it back upstream.

Spring Security

Just after that bug report, Spring Security took another fork of jBCrypt (SEC-1472 - Add support for bcrypt password encoding), from upstream but with their own fix for the bug:

rounds = 1 << log_rounds;

for (int i = 0; i != rounds; i++) {
    // Iterate
}

Here, the use of != rounds instead of < rounds avoids the problem with the overflow. The loop doesn’t stop immediately and continues with the correct number of iterations.

Later, in 2012, I wanted to use bcrypt with Spring Security, so I tried to solve the proliferation of forks with the obvious solution: another fork. Here, I combined the history from the Google Code project with the suggested fix, merged in some reformatting from Spring Security and submitted it back to Spring Security (SEC-1990 : Code cleanup on bcrypt implementation). That had the effect of switching the fix Spring Security was using.

Next, in 2013, the issue was reported to the original project’s issue tracker (Bug 2097 - if gensalt’s log_rounds parameter is set to 31 it does 0 (ZERO) rounds!). A good clear explanation and a suggestion to use a long for the rounds.

CVE-2015-0886

CVE-2015-0886 was created in January 2015, with the same issue again. Promptly afterwards, jBCrypt 0.4 was released with a combination fix:

if (log_rounds < 4 || log_rounds > 30)
    throw new IllegalArgumentException ("Bad number of rounds");
...
for (i = 0; i != rounds; i++) {

rounds stays as an integer, the loop logic is fixed to cope with overflow (as with Spring Security) and the maximum number of rounds is now set to 30, to also fail ahead of the loop. This is the third way to fix this bug: reject values which would cause overflow.

Combining two fixes is belt-and-braces, but it works (at least, until you need 2^31 rounds, predicted by one Information Security Stack Exchange answer to be around 2037).

Constant-time string comparison

The original Spring Security import also brought with it a constant-time string comparison (A Lesson In Timing Attacks). That’s also fixed in the upstream 0.4, taking the original implementation from the Spring Security contribution.

In conclusion

Depending on whether you take jBCrypt from the Central Repository, from the original project or from one of the forks, you’ll have had various bugs fixed at varying times over the last four years. For a password hash, that’s not ideal.

Spring Security’s version has, generally, been more secure than the original upstream project. That is also not ideal.

Using the version from Central, which is a fairly likely thing for a Java developer to do, currently leaves you vulnerable (albeit to a bug you’re fairly unlikely to encounter), until Issue 11: Update for 0.4 is resolved.

The ideal would be a single version, actively developed, tracked in Git, available in Maven and consistently used across projects. Without that, keep an eye out for bugs, and make sure you know who’s responsible for the security-related code that you’re running.

(Music: Sleater-Kinney, “Fangless”)

Talking HTTP/2

Thursday, 8 January 2015 (#)

While talking HTTP/1.1 is possible with nothing but an invocation of telnet (see Talking HTTP/0.9, 1.0, 1.1), HTTP/2 is a binary protocol. Draft 16 has been put forward for Last Call.

Let’s talk it!

HTTP/2 is very different from the established, textual HTTP/1.1, and may not even be supported by the server we’re talking to. If we started with an HTTP/1.1 connection, switching protocols is exactly what the Upgrade header has been waiting around for since it was introduced in 1997. As per HTTP/2 Version Identification, we want to upgrade to h2c-<draft>, or h2-<draft> over TLS. Use of these, rather than the expected (and planned) ‘HTTP/2’ is controversial but part of the standard.

There’s one extra mandatory header (HTTP2-Options) but, from the description in draft 14, an empty header is a valid way to use defaults.

The nghttp2 project has made a server available that understands draft 14 and an Upgrade from HTTP/1.1, on port 80 of nghttp2.org.

Request

GET / HTTP/1.1
Host: nghttp2.org
Connection: Upgrade, HTTP2-Settings
Upgrade: h2c-14
HTTP2-Settings:

Response

HTTP/1.1 101 Switching Protocols
Connection: Upgrade
Upgrade: h2c-14

<binary>
<plaintext response>

Excellent! We’ve just received an HTTP/2 (draft 14) response! The headers are unreadable, because they’re binary, but the plaintext response is clearly visible.

Let’s try the same thing against google.com:80:

Response

HTTP/1.1 400 Bad Request
Content-Type: text/html; charset=UTF-8
Content-Length: 1419
Date: Fri, 02 Jan 2015 09:46:59 GMT
Server: GFE/2.0
Alternate-Protocol: 80:quic,p=0.02

<!DOCTYPE html>
<html lang=en>

Hmm. 400’s not great. Let’s try twitter.com:80:

Response

HTTP/1.1 200 OK

They’re not rejecting a valid request, like Google, but they’re not upgrading to HTTP/2 either.

NPN, ALPN

Although the Upgrade: header can be used to upgrade an HTTP/1.1 connection to HTTP/2, that’s not universally supported by browsers and servers that have decided not to support HTTP/2 over non-TLS connections.

For http/2 for https://, two upgrade mechanisms are implemented at the TLS layer, NPN and ALPN. NPN is already deprecated, ahead of widespread support for ALPN, and the current HTTP/2 draft specifies ALPN (as RFC 7301).

As the list of HTTP/2 implementations shows, there’s a mix of support for Upgrade, NPN, ALPN and direct HTTP/2 connections.

For SSL we can’t use telnet anymore; OpenSSL’s s_client is one alternative:

Request

openssl s_client -connect twitter.com:443 -nextprotoneg ''

Response

CONNECTED(00000003)
Protocols advertised by server: h2-15, spdy/3.1, http/1.1

So Twitter supports draft 15 of HTTP/2. We can open a connection with:

openssl s_client -connect twitter.com:443 -nextprotoneg 'h2-15'

But we’re making a direct HTTP/2 request now, rather than regular HTTP/1.1 and requesting an upgrade, so we need to fashion a real binary request.

An HTTP/2 request

First up, a magic fingerprint just to confirm that, even after all the negotation already, I’m definitely talking HTTP/2:

#!/usr/bin/python3

prelim = bytes.fromhex('505249202a20485454502f322e300d0a0d0a534d0d0a0d0a')

The rest is a series of frames, payloads with types, lengths and a few flags set. First up, a SETTINGS frame. As before, an empty one is fine:

settings = struct.pack('>IBBI', 0, 0x04, 0, 0)[1:]

(The first length field is 24-bit; here I pack it as 32-bit and then skip the high byte.)

I should also ACK the SETTINGS that the server is going to send:

settingsAck = struct.pack('>IBBI', 0, 0x04, 0x01, 0)[1:]

Here, the 0x01 flag indicates that this is an ACK.

Now, the GET. Since this is a request with no body I simply need to provide the headers, including pseudo-headers for values that would previous have appeared as the first line of an HTTP/1.1 request:

host='twitter.com'

headerDict = {
    ':authority': host,
    ':method': 'GET',
    ':scheme': 'https',
    ':path': '/'
}

The flags here indicate that this is the only header frame and also the entirity of the request:

# type=0x01 HEADERS
# flags = END_STREAM | END_HEADERS
frame = struct.pack('>IBBI', len(header), 0x01, flags, 1)[1:] + header

The headers are encoded using HPACK, a parallel specification to HTTP/2. It describes an elaborate system of default header values and stateful compression that make it extremely efficient to send very common headers, and repeated headers, along with Huffman encoding for the values.

Luckily, we can ignore all that and send literal headers with no indexing:

def lengthed_string(s):
  b = s.encode('us-ascii')
  return struct.pack('B', len(b)) + b

def gen_headers(m):
  h = b''
  for k in m:
    v = m[k]
    h = h + struct.pack('B', 0) + lengthed_string(k) + lengthed_string(v)
  return h

headerPayload = gen_headers(headerDict)

Now, put those together into our first HTTP/2 request:

from sys import stdout

stdout.buffer.write(prelim)
stdout.buffer.write(settings)
stdout.buffer.write(settingsAck)
stdout.buffer.write(frame)

Invoke that, and send it over NPN’d SSL, keeping openssl‘s stdin open to keep it from exiting:

{ ./send-request.py; cat; } | openssl s_client -connect twitter.com:443 -nextprotoneg 'h2-15' | less

Inamidst the SSL debugging output and plaintext payload we see what we were after: a fully HTTP/2 response to our HTTP/2 request.

An HTTP/2 response

But sending requests is only half of the web. We want to make sense of what’s being sent back.

It’s not quite a reference implementation, but here’s enough Python to decode frame boundaries. We’ll also go a bit further and show the connection settings that the server wants to use (see Defined SETTINGS parameters for meanings).

#!/usr/bin/python3

import struct
import sys

def decode_SETTINGS(b):
  print(' Settings:')
  while b:
    (i, v) = struct.unpack('>HI', b[:6])
    print('  %d = %d' % (i, v))
    b = b[6:]

b = sys.stdin.buffer.read()

while b:
  (b1, b2, b3) = struct.unpack('BBB', b[:3])
  l = (b1 << 16) | (b2 << 8) | b3
  print('Length: %d' % l)
  (t, f, s) = struct.unpack('>BBI', b[3:9])
  print('Type: %d, flags: %d, stream ID: %d' % (t, f, s))
  payload = b[9:9+l]
  print(payload)

  if t == 0x04:
    decode_SETTINGS(payload)

  b = b[9 + l:]

Response

Length: 6
Type: 4, flags: 0, stream ID: 0
b'\x00\x04\x00\x01\x00\x00'
 Settings:
  4 = 65536
Length: 0
Type: 4, flags: 1, stream ID: 0
b''
 Settings:
Length: 1002
Type: 1, flags: 4, stream ID: 1
b'\x88@\x86\xb9\xdc\xb6 \xc7\xab\x87\xc7\xbf~\xb6\x02\xb8\x7fX\xad\xa8\xeb\x10d
...
c\xc9\x82\x02\xc91~\x89=\x87\xa4\xb0\x07@\x8c\xf2\xb7\x94!j\xec:JD\x98\xf5\x7f\x8a\x0f\xda\x94\x9eB\xc1\x1d\x07\'_'
Length: 2998
Type: 0, flags: 0, stream ID: 1
b'<!DOCTYPE html>\n<!--[if IE 8]><html class="lt-ie10 ie8" lang="en data-scribe-...
atePropagation=function(){};if(i){f.push(a);r("captured",a)}else r("ig'
Length: 7240
Type: 0, flags: 0, stream ID: 1
b'nored",a\n);return!1}function n($){p();for(var a=0,b;b=f[a];a++){var d=$(b.tar
...
ift/en/init.9041729dc08dc4f68fda011758b48149cb878712.js" async></script>\n\n'
Length: 0
Type: 0, flags: 1, stream ID: 1

There are a few things to notice here. SETTINGS_INITIAL_WINDOW_SIZE is being set to 65536, which is already the default. Then, the headers (Type: 4). Twitter aren’t using the same lazy hack I did, so you’d need a proper HPACK decoder to make sense of them. Then, a number of DATA frames ending with one with END_STREAM set.

How about nghttp2.org? They also support h2c-16 over NPN:

openssl s_client -connect nghttp2.org:443 -nextprotoneg h2c-16

Response

Length: 12
Type: 4, flags: 0, stream ID: 0
b'\x00\x03\x00\x00\x00d\x00\x04\x00\x00\xff\xff'
 Settings:
  3 = 100
  4 = 65535

They’re also setting SETTINGS_MAX_CONCURRENT_STREAMS to 100; it’s otherwise unlimited, and this is the recommended minimum.

Microsoft’s implementation is already using ALPN, requiring a build of OpenSSL from source:

~/source/openssl/apps/openssl s_client -connect h2duo.cloudapp.net:443 -alpn 'h2-14'

Response

Length: 18
Type: 4, flags: 0, stream ID: 0
b'\x00\x03\x00\x00\x00d\x00\x04\x00\x00\xff\xff\x00\x07\x00\x00\x00\x02'
 Settings:
  3 = 100
  4 = 65535
  7 = 2

Settings type 7 isn’t defined by the spec, but we’ll try to make sure we only use two of them.

In summary

You can’t talk HTTP/2 by typing and, despite a relatively simple spec, it’s not a weekend hack anymore. Just as you wouldn’t write your own SSL implementation, rolling your own HPACK and HTTP/2 implementations is not really feasible.

In a sense that’s good — widely-used libraries tend to be higher quality. On the downside, anything that increases the barrier to entry can easily reduce diversity.

Far more than HTTP/1.1, HTTP/2 feels specialised. If you’re a large company operating modern web applications for customers on up-to-date browsers, with latency being worth complexity and engineer effort, it’s a win. If you’re after a generic, extensible model with all optimisations left to the appropriate layers in the stack, maybe less so. It’s one of the most notable layering violations since ZFS.

As an engineering effort, it’s ingenious and opinionated. Many people vocally and articulately object to it, on both technical and political grounds. Debate and merits aside, I expect it to improve the browsing experience for the majority of users out there: that’s a good thing, even if it’s not another twenty-year protocol.

(Music: Queens of the Stone Age, “Smooth Sailing”)
Joseph Walton <joe@kafsemo.org>