Hullo!
Feeds migrating to https
Friday, 12 September 2025 (#)
Way back in 2013, post-Snowden, moving general web traffic to TLS became an increasingly good idea, then a norm.
A common practice (but not a requirement) is to give insecure requests a 301 Moved Permanently from http over to https. This let user agents update their stored URL and make subsequent requests over a secure channel. If that initial request is compromised then all bets are still off, but TOFU is often a good trade-off.
The simple polling script I use for subscribed feeds uses exactly that logic: any chain of permanent redirections is persisted. Looking back, across all feeds, what did that migration from http to https look like?
Initial adoption was slow. It was 2009 before I had an https subscription, when
http://labs.mozilla.com/feed/
(now a 404) redirected to https://mozillalabs.com/feed/
(now an SSL_ERROR_BAD_CERT_DOMAIN
). Mid-2016, https reached 5%
of my subscriptions. That’s right after Let’s Encrypt
officially launched.
In 2019 the ratio was 50-50, and it continued to increase to around 80% currently. Ironically, older feeds dropping off the web completely rather than switching still helped that ratio. Still, 20% plaintext is not great, so I took a look at the remaining holdouts.
A few were using a temporary redirection, with 302 Found:
the client ought to continue to use the target URI for future requests.
and indeed I did, fetching the original insecure resource each time.
Others were sending an Upgrade header:
Upgrade: h2
Connection: Upgrade
I could make a secure h2
connection; but the library I’m using needs changes for that.
Some other feeds were dead enough that it was time to unsubscribe.
With those migrated across manually, I’m at 96% https. Holdouts? Almost exclusively sites that aren’t actively being maintained. It’s always fun to see a feed spring back to life, even if it’s currently:
<updated>2022-03-17T05:00:46Z</updated>
or even:
<pubDate>Sun, 09 Mar 2014 00:00:00 PST</pubDate>
so I’ll poll until they 410.
Trade-Markov Chains
Monday, 8 September 2025 (#)
Apple®’s branded house model shares much structure across its trademarks. Terms repeat, and patterns reoccur. It’s structured enough to be tokenised by splitting on transitions from lowercase to uppercase, or white space:
iPad Air® ⇒
["i", "Pad", " ", "Air", "®"]
MacBook Air® ⇒
["Mac", "Book", " ", "Air", "®"]
There are a few exceptions like ‘X’, which is its own token in e.g. Xserve® and Xcode®.
Breaking out the tokens lets us see that case is mostly consistent. Exceptions include ‘Mac’, ‘TV’, ‘Vision’ and ‘Watch’ (uppercase for hardware, and lowercase when followed by ‘OS’), and ‘Touch’ (uppercase except in ‘iPod touch®’). Also good to see ‘+’ rapidly overtaking ‘.’ and ‘-’ as the most common punctuation.
The natural thing to do is place these on a graph, where each distinct token is a node linked to all tokens that appear before or after:

Notably, some tokens see broad use (‘Apple’ and ‘i’ in around fifty trademarks each, and ‘Mac’, ‘Pro’, ‘Air’ in around twenty), whereas some are used in small clusters (three ‘Engine’s and ‘Drop’s; three ‘Writer’s, all printers).
Weight the transitions according to current trademark usage to generate a Markov chain. Start on a token that’s used to start at least one trademark, and finish on one that ends. The resulting chains will, usefully, predict unused trademarks; including:
- Multi-ML
- SignTime Machine
- CloudDrive
- Time to My Mac Management Basics
- iWorkBench
- Time to Cash
- Center Stage Manager
- There's an app for Impact
- AirTunes Extras
- PowerBook AirPods Max
- iTunes Music
- Apple Pro
- Final Cut ProDOS
- HomePod classic
- iTunes Live Listen
- Final Cut ProMotion
- DVD Studio Display
- Today at Apple Immersive Video
- 3D Touch Bar
- The iPhoto Booth
- Xcode Cloud Drive
- AirPower for that
- PowerBook Pro
- Multi-Touch ID
- iPad AirPort Time
- .Mac.com
- EarPods Max
- iWebScript
- Smart Cover Flow
- Siri Remote Desktop
Following the existing structure means a few sound more “when” than “if.”
Name and labels
Monday, 19 September 2022 (#)
Firefox 56 included a new character encoding implementation (“written in Rust”!) that follows the WHATWG Encoding Standard. The spec includes names and labels for the character encodings. It’s exhaustive — “User agents must not support any other encodings or labels.” Here’s an interesting bit:
Name | Labels |
---|---|
… | … |
windows-1252 | |
"iso-8859-1 " | |
"us-ascii " | |
"windows-1252 " | |
… | … |
All three subtly different character encodings are now merged together into one, named for the one with the largest repertoire. This made me kind of nostalgic: getting these encodings mixed up was a classic interoperability problem, that required a certain amount of knowledge or tooling to identify and resolve. However, folding them together essentially solves that problem.
text/xml
gotcha
Still, as long as your document’s published as text/xml
, there’s another gotcha
to be aware of: omitting charset
doesn’t mean “autodetect,” it means US-ASCII. But!, in RFC 6657:
Each subtype of the "text" media type that uses the "charset" parameter can define its own default value for the "charset" parameter, including the absence of any default.
And then RFC 7303 says:
If an XML MIME entity is received where the charset parameter is omitted, no information is being provided about the character encoding by the MIME Content-Type header. XML-aware consumers MUST follow the requirements in section 4.3.3 of [XML] that directly address this case.
It’s easy to value knowing how to work around historic mistakes so much that you forget to push for fixing them. It’s good to see a couple of cases where that hasn’t been the case. It’s nice to be able to move on from edge cases.