Skip to main content

Reclaim the SREets

·1 min

Came across this site the other day. Though the Changelog seems to suggest it was last updated in May 2016, which makes it a bit more likely to be slop, I think the overall message is not misplaced.

Career counselling

·1 min

A short while ago, an ex-colleague from an old team asked for some career advice. I did what I could (though I didn’t think it was very useful). He was kind enough to send me a card and a (very large) collection of chocolates in return. It was a great kindness that reminded me of a great team that was always under pressure.

As the tech industry is collectively doing a lot to erase every aspect of being human from working in it, and especially any question of the vulnerability that naturally attaches to it, it does some good to remind ourselves that there is virtue in being human.

A nice thank-you card

Incident Residue

·2 mins

I’ve been thinking for a while about how incident response is going to change, and how it has already changed since the pre-ML days. Todd Underwood did a great chapter in Reliable Machine Learning which tried to illustrate how IR changes in the modern world. In brief, it becomes harder to both investigate what’s going on, and also follow the standard troubleshooting approach of building a mental model in your head of what’s happened when you no longer have a causally strong relationship between actions and outcomes. It’s also going to involve a lot more coordination between different groups, as ML will typically pull in data from across the business to a previously unprecedented extent.

But I came across this today - thanks to Eric Dobbs in RISF - which talks about one likely feature of the future that hasn’t gotten much attention outside leading edge circles, and that’s the fact that as AI SRE systems hoover up the easier tasks, the harder tasks will be the only ones that are left: the “left behind” issue.

Most folks who look at this have pointed out that as the easier issues go away, it’s harder to train on what remains, and (modulo learning styles) I think that’s true; what I think is less explored is how IR changes when you actually can’t construct a model of how the system works by asking a sufficiently aware human. We will, in short, become dependent on the same tools that created the additional complexity to penetrate and resolve that complexity in real-time, every time there’s an incident.

We should bear that in mind when we think about how to staff, and what to pay for, in the domain of incident response. The stuff that’s left behind - the incident residue - is the stickiest of all.

Komodor doing an AI SRE summit

·2 mins

The AI SRE space is, as of the time of writing, absolutely insane. At some point in 2025, I counted the number of players and the amount of money rushing into the space - it was 20+ and over a billion dollars, if you included all funding numbers I’d found plus the numbers of incumbents in e.g. Cloud talking about how much they were going to invest in the space. It may well turn out to be one of those situations where it’s easy to make a prima-facie argument that the problem space is big, almost everyone “suffers from it”, and that it’s easy to make progress (given the current state of agentic development, etc etc), but it’s quite hard to deliver something that actually makes a difference and more importantly that is not like everyone else’s three foundational models in a trenchcoat.

Earlier in my career there were very similar conversations about mobile phone providers (really operators), who quickly became seen as being essentially commodotised - everyone would pick from a similar set of network gear provided by a small set of manufacturers, the handsets were mostly commodotised etc, etc. Ultimately they did what a lot of businesses in similar positions did, which is to attempt to differentiate themselves on price, branding/marketing, or customer service. There may well be a similar effect playing out in this market too.

In unrelated events, I see that Komodor are organising an AI SRE summit and that looks like an interesting speaker list, though I wonder precisely how vendor neutral that’s going to be.

Bot traffic on the web

·1 min

From college mate Ian’s time at a Cloudflare session, we learn that bot traffic is 50% of overall web traffic, and AI agent traffic is circa 7%.

It seems likely both of those numbers will go up.