kafsemo.org

“Looking back, I had no idea that thing was being televised.”

Hullo!

Talking HTTP/2

Thursday, 8 January 2015 (#)

While talking HTTP/1.1 is possible with nothing but an invocation of telnet (see Talking HTTP/0.9, 1.0, 1.1), HTTP/2 is a binary protocol. Draft 16 has been put forward for Last Call.

Let’s talk it!

HTTP/2 is very different from the established, textual HTTP/1.1, and may not even be supported by the server we’re talking to. If we started with an HTTP/1.1 connection, switching protocols is exactly what the Upgrade header has been waiting around for since it was introduced in 1997. As per HTTP/2 Version Identification, we want to upgrade to h2c-<draft>, or h2-<draft> over TLS. Use of these, rather than the expected (and planned) ‘HTTP/2’ is controversial but part of the standard.

There’s one extra mandatory header (HTTP2-Options) but, from the description in draft 14, an empty header is a valid way to use defaults.

The nghttp2 project has made a server available that understands draft 14 and an Upgrade from HTTP/1.1, on port 80 of nghttp2.org.

Request

GET / HTTP/1.1
Host: nghttp2.org
Connection: Upgrade, HTTP2-Settings
Upgrade: h2c-14
HTTP2-Settings:

Response

HTTP/1.1 101 Switching Protocols
Connection: Upgrade
Upgrade: h2c-14

<binary>
<plaintext response>

Excellent! We’ve just received an HTTP/2 (draft 14) response! The headers are unreadable, because they’re binary, but the plaintext response is clearly visible.

Let’s try the same thing against google.com:80:

Response

HTTP/1.1 400 Bad Request
Content-Type: text/html; charset=UTF-8
Content-Length: 1419
Date: Fri, 02 Jan 2015 09:46:59 GMT
Server: GFE/2.0
Alternate-Protocol: 80:quic,p=0.02

<!DOCTYPE html>
<html lang=en>

Hmm. 400’s not great. Let’s try twitter.com:80:

Response

HTTP/1.1 200 OK

They’re not rejecting a valid request, like Google, but they’re not upgrading to HTTP/2 either.

NPN, ALPN

Although the Upgrade: header can be used to upgrade an HTTP/1.1 connection to HTTP/2, that’s not universally supported by browsers and servers that have decided not to support HTTP/2 over non-TLS connections.

For http/2 for https://, two upgrade mechanisms are implemented at the TLS layer, NPN and ALPN. NPN is already deprecated, ahead of widespread support for ALPN, and the current HTTP/2 draft specifies ALPN (as RFC 7301).

As the list of HTTP/2 implementations shows, there’s a mix of support for Upgrade, NPN, ALPN and direct HTTP/2 connections.

For SSL we can’t use telnet anymore; OpenSSL’s s_client is one alternative:

Request

openssl s_client -connect twitter.com:443 -nextprotoneg ''

Response

CONNECTED(00000003)
Protocols advertised by server: h2-15, spdy/3.1, http/1.1

So Twitter supports draft 15 of HTTP/2. We can open a connection with:

openssl s_client -connect twitter.com:443 -nextprotoneg 'h2-15'

But we’re making a direct HTTP/2 request now, rather than regular HTTP/1.1 and requesting an upgrade, so we need to fashion a real binary request.

An HTTP/2 request

First up, a magic fingerprint just to confirm that, even after all the negotation already, I’m definitely talking HTTP/2:

#!/usr/bin/python3

prelim = bytes.fromhex('505249202a20485454502f322e300d0a0d0a534d0d0a0d0a')

The rest is a series of frames, payloads with types, lengths and a few flags set. First up, a SETTINGS frame. As before, an empty one is fine:

settings = struct.pack('>IBBI', 0, 0x04, 0, 0)[1:]

(The first length field is 24-bit; here I pack it as 32-bit and then skip the high byte.)

I should also ACK the SETTINGS that the server is going to send:

settingsAck = struct.pack('>IBBI', 0, 0x04, 0x01, 0)[1:]

Here, the 0x01 flag indicates that this is an ACK.

Now, the GET. Since this is a request with no body I simply need to provide the headers, including pseudo-headers for values that would previous have appeared as the first line of an HTTP/1.1 request:

host='twitter.com'

headerDict = {
    ':authority': host,
    ':method': 'GET',
    ':scheme': 'https',
    ':path': '/'
}

The flags here indicate that this is the only header frame and also the entirity of the request:

# type=0x01 HEADERS
# flags = END_STREAM | END_HEADERS
frame = struct.pack('>IBBI', len(header), 0x01, flags, 1)[1:] + header

The headers are encoded using HPACK, a parallel specification to HTTP/2. It describes an elaborate system of default header values and stateful compression that make it extremely efficient to send very common headers, and repeated headers, along with Huffman encoding for the values.

Luckily, we can ignore all that and send literal headers with no indexing:

def lengthed_string(s):
  b = s.encode('us-ascii')
  return struct.pack('B', len(b)) + b

def gen_headers(m):
  h = b''
  for k in m:
    v = m[k]
    h = h + struct.pack('B', 0) + lengthed_string(k) + lengthed_string(v)
  return h

headerPayload = gen_headers(headerDict)

Now, put those together into our first HTTP/2 request:

from sys import stdout

stdout.buffer.write(prelim)
stdout.buffer.write(settings)
stdout.buffer.write(settingsAck)
stdout.buffer.write(frame)

Invoke that, and send it over NPN’d SSL, keeping openssl‘s stdin open to keep it from exiting:

{ ./send-request.py; cat; } | openssl s_client -connect twitter.com:443 -nextprotoneg 'h2-15' | less

Inamidst the SSL debugging output and plaintext payload we see what we were after: a fully HTTP/2 response to our HTTP/2 request.

An HTTP/2 response

But sending requests is only half of the web. We want to make sense of what’s being sent back.

It’s not quite a reference implementation, but here’s enough Python to decode frame boundaries. We’ll also go a bit further and show the connection settings that the server wants to use (see Defined SETTINGS parameters for meanings).

#!/usr/bin/python3

import struct
import sys

def decode_SETTINGS(b):
  print(' Settings:')
  while b:
    (i, v) = struct.unpack('>HI', b[:6])
    print('  %d = %d' % (i, v))
    b = b[6:]

b = sys.stdin.buffer.read()

while b:
  (b1, b2, b3) = struct.unpack('BBB', b[:3])
  l = (b1 << 16) | (b2 << 8) | b3
  print('Length: %d' % l)
  (t, f, s) = struct.unpack('>BBI', b[3:9])
  print('Type: %d, flags: %d, stream ID: %d' % (t, f, s))
  payload = b[9:9+l]
  print(payload)

  if t == 0x04:
    decode_SETTINGS(payload)

  b = b[9 + l:]

Response

Length: 6
Type: 4, flags: 0, stream ID: 0
b'\x00\x04\x00\x01\x00\x00'
 Settings:
  4 = 65536
Length: 0
Type: 4, flags: 1, stream ID: 0
b''
 Settings:
Length: 1002
Type: 1, flags: 4, stream ID: 1
b'\x88@\x86\xb9\xdc\xb6 \xc7\xab\x87\xc7\xbf~\xb6\x02\xb8\x7fX\xad\xa8\xeb\x10d
...
c\xc9\x82\x02\xc91~\x89=\x87\xa4\xb0\x07@\x8c\xf2\xb7\x94!j\xec:JD\x98\xf5\x7f\x8a\x0f\xda\x94\x9eB\xc1\x1d\x07\'_'
Length: 2998
Type: 0, flags: 0, stream ID: 1
b'<!DOCTYPE html>\n<!--[if IE 8]><html class="lt-ie10 ie8" lang="en data-scribe-...
atePropagation=function(){};if(i){f.push(a);r("captured",a)}else r("ig'
Length: 7240
Type: 0, flags: 0, stream ID: 1
b'nored",a\n);return!1}function n($){p();for(var a=0,b;b=f[a];a++){var d=$(b.tar
...
ift/en/init.9041729dc08dc4f68fda011758b48149cb878712.js" async></script>\n\n'
Length: 0
Type: 0, flags: 1, stream ID: 1

There are a few things to notice here. SETTINGS_INITIAL_WINDOW_SIZE is being set to 65536, which is already the default. Then, the headers (Type: 4). Twitter aren’t using the same lazy hack I did, so you’d need a proper HPACK decoder to make sense of them. Then, a number of DATA frames ending with one with END_STREAM set.

How about nghttp2.org? They also support h2c-16 over NPN:

openssl s_client -connect nghttp2.org:443 -nextprotoneg h2c-16

Response

Length: 12
Type: 4, flags: 0, stream ID: 0
b'\x00\x03\x00\x00\x00d\x00\x04\x00\x00\xff\xff'
 Settings:
  3 = 100
  4 = 65535

They’re also setting SETTINGS_MAX_CONCURRENT_STREAMS to 100; it’s otherwise unlimited, and this is the recommended minimum.

Microsoft’s implementation is already using ALPN, requiring a build of OpenSSL from source:

~/source/openssl/apps/openssl s_client -connect h2duo.cloudapp.net:443 -alpn 'h2-14'

Response

Length: 18
Type: 4, flags: 0, stream ID: 0
b'\x00\x03\x00\x00\x00d\x00\x04\x00\x00\xff\xff\x00\x07\x00\x00\x00\x02'
 Settings:
  3 = 100
  4 = 65535
  7 = 2

Settings type 7 isn’t defined by the spec, but we’ll try to make sure we only use two of them.

In summary

You can’t talk HTTP/2 by typing and, despite a relatively simple spec, it’s not a weekend hack anymore. Just as you wouldn’t write your own SSL implementation, rolling your own HPACK and HTTP/2 implementations is not really feasible.

In a sense that’s good — widely-used libraries tend to be higher quality. On the downside, anything that increases the barrier to entry can easily reduce diversity.

Far more than HTTP/1.1, HTTP/2 feels specialised. If you’re a large company operating modern web applications for customers on up-to-date browsers, with latency being worth complexity and engineer effort, it’s a win. If you’re after a generic, extensible model with all optimisations left to the appropriate layers in the stack, maybe less so. It’s one of the most notable layering violations since ZFS.

As an engineering effort, it’s ingenious and opinionated. Many people vocally and articulately object to it, on both technical and political grounds. Debate and merits aside, I expect it to improve the browsing experience for the majority of users out there: that’s a good thing, even if it’s not another twenty-year protocol.

(Music: Queens of the Stone Age, “Smooth Sailing”)

Talking HTTP/0.9, 1.0, 1.1

Saturday, 3 January 2015 (#)

Most protocols have libraries and tools available to abstract away the underlying communications. However, if you’re a full-stack developer, you’ll have used telnet to talk a protocol directly, and it’s very likely to have been HTTP. It’s a textual request/response protocol. telnet to port 80:

$ telnet www.apache.org 80

Wait for a connection:

Trying 54.172.167.43...
Connected to www.apache.org.
Escape character is '^]'.

Then type the request and wait for a response and for the server to close the socket:

Connection closed by foreign host.

Any further sophistication is opt-in and can be ignored for now.

From HTTP 0.9, back in 1991, it’s easy to make a request:

Request

GET /

Response

<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
<html><head>
<title>301 Moved Permanently</title>
</head><body>
<h1>Moved Permanently</h1>
<p>The document has moved <a href="http://any23.apache.org">here</a>.</p>
</body></html>

RFC 1945, in 1996, introduced a mandatory version number and extensible headers in the request and the response, so there’s an extra newline at the end:

Request

GET / HTTP/1.0

Response

HTTP/1.1 301 Moved Permanently
Date: Fri, 02 Jan 2015 05:40:17 GMT
Server: Apache/2.4.7 (Ubuntu)
Location: http://any23.apache.org
Content-Length: 231
Connection: close
Content-Type: text/html; charset=iso-8859-1

<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
<html><head>
<title>301 Moved Permanently</title>
</head><body>
<h1>Moved Permanently</h1>
<p>The document has moved <a href="http://any23.apache.org">here</a>.</p>
</body></html>

HTTP 0.9’s obsolesence was formally recognised in RFC 7230 (2014), which explicitly dropped the requirement to support HTTP 0.9:

The expectation to support HTTP/0.9 requests has been removed. (Appendix A)

(apache.org was the first site I tried which still accepted and responded with 0.9.)

HTTP/1.1 was introduced in RFC 2068 (1997), refined in RFC 2616 (1999) and then majorly refined in RFC 7230-7235 (2014). A new header became mandatory:

Request

GET / HTTP/1.1
Host: www.apache.org

Response

HTTP/1.1 200 OK
Date: Fri, 02 Jan 2015 05:49:16 GMT
Server: Apache/2.4.7 (Ubuntu)
Last-Modified: Fri, 02 Jan 2015 05:10:41 GMT
ETag: "a120-50ba45cb03985"
Accept-Ranges: bytes
Content-Length: 41248
Vary: Accept-Encoding
Cache-Control: max-age=3600
Expires: Fri, 02 Jan 2015 06:49:16 GMT
Content-Type: text/html

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html lang="en">
  <head>
    <title>Welcome to The Apache Software Foundation!</title>

Twenty-three years of evolution has introduced a couple more things to remember when we telnet in, with one of them being optional.

HTTP/2

After sixteen drafts (and four major versions over three years at Google in its SPDY incarnation), HTTP/2 has moved to Last Call (W3C Recommendation Track Process - Last Call Announcement).

That’s dangerously close to being the current version of arguably the most important protocol on the Internet. Even if I’m going to be using it through libraries and browsers, I should at least know how to craft a basic request and parse a response.

HTTP/2 is no longer textual: it’s a binary protocol. After looking at the spec (draft 16) it took me way longer than those examples to get something that would talk HTTP/2, so that’s a separate post.

(Music: I Monster, “Lust for a Vampyr”)

Please check TLS hostnames

Thursday, 1 January 2015 (#)

I need a quick script to check a mailbox. My go-to language is Python, and its batteries-included philosophy means I go straight to imaplib:

#!/usr/bin/python3

import imaplib

conn = imaplib.IMAP4_SSL('imap.gmail.com')

# Now we're ready to use conn.login to send username and password

I’m security-conscious, so I’ve requested a TLS connection.

(With Ubuntu, that could still be an SSLv3 connection. The POODLE attacks on SSLv3 were only demonstrated as a problem in a browser when an attacker can force repeated connections including plaintext they control.)

Before I send my username and password over this connection, I know I have a secure connection I know that I requested a connection to imap.gmail.com and I can also see that I haven’t tampered with any crypto defaults I don’t understand. Ready?

I can see that Python had a bug where that hostname wasn’t checked (No SSL match_hostname() in imaplib). That means my connection could be MitM’d by anyone with a valid SSL certificate; and they’re free now. But that was fixed in Python 3.4, so I’m good?

Secure by default

By default, imaplib doesn't check the hostname. Before I send my username and password, let’s make sure I’m actually talking to imap.gmail.com:

#!/usr/bin/python3

import imaplib, socket, ssl

context = ssl.SSLContext(ssl.PROTOCOL_TLSv1_2)
context.verify_mode = ssl.CERT_REQUIRED
context.check_hostname = True
context.load_default_certs()

conn = imaplib.IMAP4_SSL('imap.gmail.com', ssl_context=context)

# Good to go!
conn.login('AzureDiamond', 'hunter2')

It’s that context.check_hostname = True that makes the critical check. Before using the connection, confirm that the SSL certificate is for the hostname that I requested.

Taking a look through Der Spiegel’s latest release of Snowden documents it’s notable that “some forms of encryption still cause problems for the NSA”. If your intention is to preserve your users’ privacy across the public Internet, remember that security tends to fail catastrophically and that exploitation attempts are constant and better-implemented than you might expect (“an algorithm that searches GitHub 24 hours per day for API keys”).

If your intention is to provide a library or service, make it secure by default. Your users may not thank you when things break, but it’s the responsible choice.

(Music: Cracker, “El Cerrito”)
Joseph Walton <joe@kafsemo.org>