kafsemo.org

Hullo!

Feeds migrating to https

Friday, 12 September 2025 (#)

Way back in 2013, post-Snowden, moving general web traffic to TLS became an increasingly good idea, then a norm.

A common practice (but not a requirement) is to give insecure requests a 301 Moved Permanently from http over to https. This let user agents update their stored URL and make subsequent requests over a secure channel. If that initial request is compromised then all bets are still off, but TOFU is often a good trade-off.

The simple polling script I use for subscribed feeds uses exactly that logic: any chain of permanent redirections is persisted. Looking back, across all feeds, what did that migration from http to https look like?

Initial adoption was slow. It was 2009 before I had an https subscription, when http://labs.mozilla.com/feed/ (now a 404) redirected to https://mozillalabs.com/feed/ (now an SSL_ERROR_BAD_CERT_DOMAIN). Mid-2016, https reached 5% of my subscriptions. That’s right after Let’s Encrypt officially launched.

In 2019 the ratio was 50-50, and it continued to increase to around 80% currently. Ironically, older feeds dropping off the web completely rather than switching still helped that ratio. Still, 20% plaintext is not great, so I took a look at the remaining holdouts.

A few were using a temporary redirection, with 302 Found:

the client ought to continue to use the target URI for future requests.

and indeed I did, fetching the original insecure resource each time.

Others were sending an Upgrade header:

Upgrade: h2
Connection: Upgrade

I could make a secure h2 connection; but the library I’m using needs changes for that.

Some other feeds were dead enough that it was time to unsubscribe.

With those migrated across manually, I’m at 96% https. Holdouts? Almost exclusively sites that aren’t actively being maintained. It’s always fun to see a feed spring back to life, even if it’s currently:

<updated>2022-03-17T05:00:46Z</updated>

or even:

<pubDate>Sun, 09 Mar 2014 00:00:00 PST</pubDate>

so I’ll poll until they 410.

(Music: Blonde Redhead, “Sit Down for Dinner (Part 2)”)

Trade-Markov Chains

Monday, 8 September 2025 (#)

Apple®’s branded house model shares much structure across its trademarks. Terms repeat, and patterns reoccur. It’s structured enough to be tokenised by splitting on transitions from lowercase to uppercase, or white space:

iPad Air® ⇒ ["i", "Pad", " ", "Air", "®"]

MacBook Air® ⇒ ["Mac", "Book", " ", "Air", "®"]

There are a few exceptions like ‘X’, which is its own token in e.g. Xserve® and Xcode®.

Breaking out the tokens lets us see that case is mostly consistent. Exceptions include ‘Mac’, ‘TV’, ‘Vision’ and ‘Watch’ (uppercase for hardware, and lowercase when followed by ‘OS’), and ‘Touch’ (uppercase except in ‘iPod touch®’). Also good to see ‘+’ rapidly overtaking ‘.’ and ‘-’ as the most common punctuation.

The natural thing to do is place these on a graph, where each distinct token is a node linked to all tokens that appear before or after:

A graph of all Apple trademarks showing all links between tokens

Notably, some tokens see broad use (‘Apple’ and ‘i’ in around fifty trademarks each, and ‘Mac’, ‘Pro’, ‘Air’ in around twenty), whereas some are used in small clusters (three ‘Engine’s and ‘Drop’s; three ‘Writer’s, all printers).

Weight the transitions according to current trademark usage to generate a Markov chain. Start on a token that’s used to start at least one trademark, and finish on one that ends. The resulting chains will, usefully, predict unused trademarks; including:

Multi-ML
SignTime Machine
CloudDrive
Time to My Mac Management Basics
iWorkBench
Time to Cash
Center Stage Manager
There's an app for Impact
AirTunes Extras
PowerBook AirPods Max
iTunes Music
Apple Pro
Final Cut ProDOS
HomePod classic
iTunes Live Listen
Final Cut ProMotion
DVD Studio Display
Today at Apple Immersive Video
3D Touch Bar
The iPhoto Booth
Xcode Cloud Drive
AirPower for that
PowerBook Pro
Multi-Touch ID
iPad AirPort Time
.Mac.com
EarPods Max
iWebScript
Smart Cover Flow
Siri Remote Desktop

Following the existing structure means a few sound more “when” than “if.”

(Music: The Hives, “These Spectacles Reveal the Nostalgics”)

Name and labels

Monday, 19 September 2022 (#)

Firefox 56 included a new character encoding implementation (“written in Rust”!) that follows the WHATWG Encoding Standard. The spec includes names and labels for the character encodings. It’s exhaustive — “User agents must not support any other encodings or labels.” Here’s an interesting bit:

Name	Labels
…	…
windows-1252
	"`iso-8859-1`"
	"`us-ascii`"
	"`windows-1252`"
…	…

All three subtly different character encodings are now merged together into one, named for the one with the largest repertoire. This made me kind of nostalgic: getting these encodings mixed up was a classic interoperability problem, that required a certain amount of knowledge or tooling to identify and resolve. However, folding them together essentially solves that problem.

`text/xml` gotcha

Still, as long as your document’s published as text/xml, there’s another gotcha to be aware of: omitting charset doesn’t mean “autodetect,” it means US-ASCII. But!, in RFC 6657:

Each subtype of the "text" media type that uses the "charset" parameter can define its own default value for the "charset" parameter, including the absence of any default.

And then RFC 7303 says:

If an XML MIME entity is received where the charset parameter is omitted, no information is being provided about the character encoding by the MIME Content-Type header. XML-aware consumers MUST follow the requirements in section 4.3.3 of [XML] that directly address this case.

It’s easy to value knowing how to work around historic mistakes so much that you forget to push for fixing them. It’s good to see a couple of cases where that hasn’t been the case. It’s nice to be able to move on from edge cases.

(Music: Santigold, “L.E.S. Artistes”)

Joseph Walton <joe@kafsemo.org>

kafsemo.org

“- Compliance of W£C coding standards”

Content

Ephemera

Toys

Archive

Feeds migrating to https

Friday, 12 September 2025 (#)

Trade-Markov Chains

Monday, 8 September 2025 (#)

Name and labels

Monday, 19 September 2022 (#)

`text/xml` gotcha

kafsemo.org

“- Compliance of W£C coding standards”

Content

Ephemera

Toys

Archive

Feeds migrating to https

Friday, 12 September 2025 (#)

Trade-Markov Chains

Monday, 8 September 2025 (#)

Name and labels

Monday, 19 September 2022 (#)

text/xml gotcha

`text/xml` gotcha