<?xml version="1.0" encoding="utf-8"?><feed xmlns="http://www.w3.org/2005/Atom" ><generator uri="https://jekyllrb.com/" version="4.4.1">Jekyll</generator><link href="https://bexelbie.com/feed.xml" rel="self" type="application/atom+xml" /><link href="https://bexelbie.com/" rel="alternate" type="text/html" /><updated>2026-04-02T15:07:29+02:00</updated><id>https://bexelbie.com/feed.xml</id><title type="html">this could’ve been an email</title><subtitle>Notes on Linux, side projects, and figuring things out in the Czech Republic.</subtitle><author><name>Brian &quot;bex&quot; Exelbierd</name></author><entry><title type="html">A Few More Thoughts on Sashiko and the Kernel</title><link href="https://bexelbie.com/2026/04/02/more-thoughts-sashiko-kernel.html" rel="alternate" type="text/html" title="A Few More Thoughts on Sashiko and the Kernel" /><published>2026-04-02T13:50:00+02:00</published><updated>2026-04-02T13:50:00+02:00</updated><id>https://bexelbie.com/2026/04/02/more-thoughts-sashiko-kernel</id><content type="html" xml:base="https://bexelbie.com/2026/04/02/more-thoughts-sashiko-kernel.html"><![CDATA[<p>Disclaimer: I work at Microsoft on upstream Linux in Azure. These are my personal notes and opinions.</p>

<p>I kept thinking about the <a href="https://lwn.net/Articles/1064830/">LWN article $</a> and the <a href="/2026/04/01/whats-in-a-sashiko-review.html">basic analysis I did yesterday</a>. I kept coming back to one of the central themes of the mailing list conversation: false positives. Sashiko’s false positive rate is debated, but, I’m gathering, is pretty good by LLM standards. Still, there was a complaint about the number of false positives focused on the burden that false positives put on contributors and maintainers.</p>

<p>I wanted to understand if the false positive rate, and by extension the burden, was higher from an LLM than from human reviewers. To run that experiment, I needed to define what a false positive actually is. That turns out to be the interesting part.</p>

<h2 id="the-definition-problem">The Definition Problem</h2>

<p>My initial naïve definition of a false positive was any substantial comment that doesn’t yield a code change. If you said something and the code wasn’t changed, then even if it generated future work, it wasn’t applicable to this change now. The obvious hole is a comment that raises a future code change coming in a different patch set. But it felt like this number could be directionally accurate for understanding if we get more false positives or not.</p>

<p>The deeper problem is that “comment that doesn’t change code” isn’t really what false positive means in review. The act of questioning code can lead to greater confidence in the patch being proposed. It can reveal unrelated changes that are required or surface features that should also be considered. Not a negative outcome, but potentially not relevant to the actual patch set under discussion. So I tried reframing from false positives to burden: any comment that doesn’t result in a code change and was actually read by the contributor or maintainer is burdensome. It doesn’t matter whether a human or LLM reviewer raised the comment. If it didn’t result in a change, it was work or thought they didn’t need to do. For example, a back-and-forth conversation to prove the correctness of something that was already correct.</p>

<p>But that definition fails too, and the reason it fails is the real insight.</p>

<p>If two humans are engaged in a review process and there’s a back-and-forth conversation that does not result in a code change, most likely neither human would describe this as unnecessary burden. They would probably describe it as work they had to do or effort they expended, but both humans have likely come out of that conversation changed. Greater understanding of different parts of the system. Better ability to express oneself so the questions aren’t raised next time. Increased confidence in the correctness of a solution. There is a change assumed to have happened to one or both of the people.</p>

<p>A review conversation that doesn’t change code but changes the people having it isn’t a false positive. It only looks like one when the reviewer is a machine that won’t be changed.</p>

<p>For what it’s worth, I did look at existing studies of human review false positive rates. In my brief and non-exhaustive look, I’ve come to believe they aren’t useful here, not only because the question is moot when both parties come out changed, but because many are flawed or non-comparable. Some are in domains where reviewers are generalists talking to a specialist, unlikely in the kernel. Others misclassify trivial exchanges like “LGTM” or “thanks” as false positives. And none have been conducted over the kernel.</p>

<h2 id="when-the-reviewer-is-a-machine">When the Reviewer Is a Machine</h2>

<p>When a finding or probing question is raised by an LLM agent, the assumption that both parties come out changed breaks down.</p>

<p>Probing questions may not even be welcome from an LLM agent. One could never really be sure whether this was a “humans normally say this kind of thing in this context” situation versus an “I see something that maybe is wrong” situation.</p>

<p>But the more important part is this: if a human has to read a false positive, they have to put in their side of the work to validate, verify, explore, or test the question, and ultimately determine that it’s not an issue. They are unlikely to be changed in the absence of an exchange. And we know for a fact that the machine is not going to be changed.</p>

<p>In theory, we could wire up a training loop for Sashiko to take these back-and-forth exchanges and learn from them to reduce the incidence of false positives. I suspect it would have very little impact overall. First, the analysis showed that there’s almost no situation where the same bug is being surfaced over and over again. The machine is unlikely to run into the same finding and then have learned that finding isn’t valid. Second, the machine is not arguing from a position of true reasoning, therefore it is never clear if it backed down because it decided to be an agreeable sycophant or because the additional commentary made the correctness argument airtight.</p>

<h2 id="the-social-problem">The Social Problem</h2>

<p>At its true core, I think the conversation around false positives, based on what I read in the article, is likely a social problem, like most truly intractable problems in computer science.</p>

<p>If an LLM agent reviews my contribution and the maintainer insists that I address the review, I am not only forced to do what turns out, in the case of a false positive, to be unnecessary work, but forced to performatively defend myself against a machine. Or worse, argue with the machine performatively. The combination of unnecessary work that generates no value, plus being forced to do so performatively in the face of knowing it generates no value, but now having to do more work to show that I generated the work that did no value is a line too far for most of our psyches.</p>

<h2 id="a-possible-path">A Possible Path</h2>

<p>Setting aside the separate question of whether LLM ability will continue improving and therefore the number of false positives will go down, the core question of how to deal with false positives needs to be addressed at a social level.</p>

<p>In a space like the kernel, I would argue it may be appropriate to allow those whose code has been reviewed to react to LLM-generated findings with something along the lines of “smells like bullshit” and not have to go through the performative exercise of proving it’s bullshit, because we trust their instinct.</p>

<p>That said, it is probably worth creating some kind of long-term profile or scoreboard, both of those being the wrong words, for a contributor, so that they can over time understand if their intuition has blind spots. If an LLM is consistently raising a certain kind of feedback that they are dismissing, but we later discover a bug and have to fix it, or if human reviewers come back and their synthesis of their own experience plus what the LLM provided leads them to believe there’s a real, demonstrable problem, that’s a learning opportunity for the contributor.</p>

<p>The challenge is that there are no systems I’m aware of in modern use where these kinds of profiles are ever not used abusively against those profiled. Which is yet another social problem.</p>]]></content><author><name>Brian &quot;bex&quot; Exelbierd</name></author><summary type="html"><![CDATA[False positives aren't the real problem with LLM code review. The burden is social, not statistical.]]></summary></entry><entry><title type="html">What’s Actually in a Sashiko Review?</title><link href="https://bexelbie.com/2026/04/01/whats-in-a-sashiko-review.html" rel="alternate" type="text/html" title="What’s Actually in a Sashiko Review?" /><published>2026-04-01T23:20:00+02:00</published><updated>2026-04-01T23:20:00+02:00</updated><id>https://bexelbie.com/2026/04/01/whats-in-a-sashiko-review</id><content type="html" xml:base="https://bexelbie.com/2026/04/01/whats-in-a-sashiko-review.html"><![CDATA[<p>Disclaimer: I work at Microsoft on upstream Linux in Azure. These are my personal notes and opinions. And, yes, I’m aware of the date. The data is real - and in 40 minutes it won’t be April 1 anymore, at least where I live.</p>

<p>Daroc Alden’s <a href="https://lwn.net/Articles/1064830/">LWN article on Sashiko $</a> captures a real tension in the Linux kernel community. Andrew Morton wants to make Sashiko - an LLM-based patch reviewer - a mandatory part of the memory management workflow. Lorenzo Stoakes and others say it’s too noisy and adds burden to already-overworked maintainers. Morton points to a ~60% hit rate on actual bugs. Stoakes points out that’s per-review, not per-comment, so the individual false positive rate is worse.</p>

<p>Reading the thread, I kept wondering about two specific mechanisms that could be driving maintainer frustration beyond the false positive question.</p>

<h2 id="two-hypotheses">Two Hypotheses</h2>

<p><strong>Hypothesis 1: Reviewers are getting told about bugs they didn’t create.</strong> Sashiko’s review protocol explicitly instructs the LLM to read surrounding code, not just the diff. That’s good review practice - but it means the tool might flag pre-existing bugs in code the patch author merely touched, putting those problems in their inbox.</p>

<p><strong>Hypothesis 2: The same pre-existing bugs surface repeatedly.</strong> If a known issue in a subsystem doesn’t get fixed between review runs, every patch touching nearby code could trigger the same finding. That would create a steady drip of duplicate noise across the mailing list.</p>

<p>I pulled data from Sashiko’s public API and tested both.</p>

<h2 id="method">Method</h2>

<p>I fetched all 406 patchsets from the linux-mm mailing list and a 500-patchset sample from LKML as of April 1, 2026. Of the 252 linux-mm reviews with findings, 204 had full review text available for analysis.</p>

<p>I had an LLM write Python scripts to classify the 466 extracted findings into three categories using deterministic regex pattern matching - roughly 50 weighted patterns that look for specific language in the review text. The classification code runs the same way every time on the same input. An LLM wrote it, but the scanning itself involves no inferencing.</p>

<p>The three categories:</p>

<ol>
  <li><strong>Patch-specific</strong> - about the actual changed lines. Patterns match phrases like “this patch adds,” “the new code,” “missing check.”</li>
  <li><strong>Interaction</strong> - about how new code interacts with existing code. Patterns match references to callers, callees, lock state, concurrent access.</li>
  <li><strong>Pre-existing</strong> - about bugs in surrounding code not introduced by the patch. Patterns match “not introduced by this patch,” “pre-existing,” “noticed while reviewing.”</li>
</ol>

<p>When a finding matched multiple categories, the most specific won: pre-existing &gt; interaction &gt; patch-specific. About 7% of findings didn’t match any pattern and were excluded from further analysis.</p>

<p>For duplication, the scripts computed pairwise text similarity across reviews within the same subsystem. Again - deterministic comparison, LLM-authored code.</p>

<p>The full methodology, including the code used, a cached copy of the reviews, and the classification patterns and caveats, is in the <a href="https://github.com/bexelbie/sashiko-analysis/blob/main/analysis-review-scope.md">analysis document</a> in <a href="https://github.com/bexelbie/sashiko-analysis">github.com/bexelbie/sashiko-analysis</a>.</p>

<h2 id="what-the-data-shows">What the Data Shows</h2>

<p><strong>Hypothesis 2 is dead.</strong> Cross-review duplication was essentially zero. Across 16 LKML subsystems with 5+ reviewed patches each, only one pair of findings exceeded the similarity threshold - and that was the same author submitting similar patches, not the same bug recurring. Whatever is driving maintainer frustration, it’s not the same findings appearing over and over. While it is possible this would surface in a larger sample size, I personally find it unlikely.</p>

<p><strong>Hypothesis 1 is partially supported, but the story is in the distribution.</strong> About 9% of findings explicitly discuss pre-existing issues. Averaged across all reviews, that’s roughly 12 words per review - barely noticeable.</p>

<p>But the average is misleading. The distribution is bimodal: 81% of reviews contain zero pre-existing findings. The other 19% contain pre-existing findings that constitute 28% of the review on average, adding roughly 19 lines to what the patch author reads. A few reviews are 75-82% pre-existing content.</p>

<p>Here’s the breakdown of what an average review with findings contains:</p>

<table>
  <thead>
    <tr>
      <th>Category</th>
      <th>% of findings</th>
      <th>Avg words</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>About the submitted patch</td>
      <td>72%</td>
      <td>74</td>
    </tr>
    <tr>
      <td>Patch × existing code interactions</td>
      <td>12%</td>
      <td>103</td>
    </tr>
    <tr>
      <td>Pre-existing issues</td>
      <td>9%</td>
      <td>62</td>
    </tr>
    <tr>
      <td>Unclassified</td>
      <td>8%</td>
      <td>47</td>
    </tr>
  </tbody>
</table>

<p>The interaction findings (category 2) are worth calling out. They’re the longest - 103 words on average, 39% more than patch-specific findings - because explaining how new code breaks against existing behavior requires describing that behavior. These are arguably the hardest findings for a human reviewer to produce and exactly where a tool with codebase-wide context adds value.</p>

<h2 id="who-owns-this-bug-now">Who Owns This Bug Now?</h2>

<p>The sharpest question the data raises isn’t statistical. It’s social.</p>

<p>When you submit a patch to linux-mm and get a Sashiko review, there’s roughly a 1-in-5 chance that a meaningful chunk of that review describes a bug you didn’t write - a race, a leak, a use-after-free in the code you’re modifying. Some of these are trivial (typos in nearby comments). Some are substantive.</p>

<p>Either way, the review has put it in your inbox. You are now the person who has been told about it.</p>

<p>Morton’s position - “don’t add bugs” as Rule #1 - makes sense if the tool’s output is mostly about your patch. And it is: ~85% of findings concern either the submitted change or its direct interactions with existing code. But 1 in 5 reviewees is also getting handed someone else’s problem, with an implicit expectation to respond.</p>

<p>Stoakes’s concern about maintainer burden lands differently when you see the bimodal distribution. The average review is manageable. The tail is not.</p>

<h2 id="what-this-doesnt-answer">What This Doesn’t Answer</h2>

<p>This analysis classifies <em>scope</em> - whether a finding is about the submitted patch, its interactions, or pre-existing code. It does not measure <em>correctness</em>. The core Morton/Stoakes disagreement is about false positive rates within on-topic findings - how often Sashiko flags something in your patch that turns out to be wrong. That question requires domain expertise to evaluate each finding individually, and this data doesn’t go there.</p>

<p>The classification also has limits. The regex patterns achieve ~93% coverage but aren’t semantic - borderline cases between categories get decided by pattern specificity, not understanding. The proportions are directionally sound but not precise.</p>

<p>The full data, methodology, and API references are in the repository, <a href="https://github.com/bexelbie/sashiko-analysis">github.com/bexelbie/sashiko-analysis</a> if anyone wants to reproduce or extend this.</p>]]></content><author><name>Brian &quot;bex&quot; Exelbierd</name></author><summary type="html"><![CDATA[I pulled Sashiko's public review data and tested two hypotheses about what's driving kernel maintainer frustration.]]></summary></entry><entry><title type="html">Reflecting on “Warranty Void If Regenerated”</title><link href="https://bexelbie.com/2026/03/23/reflecting-on-warranty-void-if-regenerated.html" rel="alternate" type="text/html" title="Reflecting on “Warranty Void If Regenerated”" /><published>2026-03-23T11:50:00+01:00</published><updated>2026-03-23T11:50:00+01:00</updated><id>https://bexelbie.com/2026/03/23/reflecting-on-warranty-void-if-regenerated</id><content type="html" xml:base="https://bexelbie.com/2026/03/23/reflecting-on-warranty-void-if-regenerated.html"><![CDATA[<p>I’ve seen <a href="https://nearzero.software/p/warranty-void-if-regenerated">“Warranty Void if Regenerated”</a> going around, particularly among the subset of my friends who believe “LLMs are slop generators”. They typically characterize it as overly optimistic - hopeful, if not downright fantasy.</p>

<p>The “slop generator” position is, in my opinion, demonstrably false, as countless successful code generation outcomes contradict such a sweeping generalization. The dogged pursuit of this position clouds the issue of the real concerns with LLMs as built and used today.  I believe there are legitimate company ethics, environmental, and license/copyright concerns worthy of consideration in this space.  I also believe that we are still in a highly emotional place where those concerns tend to be both understated and overstated depending on who is talking.</p>

<p>The story consists of three vignettes told from the perspective of Tom, a post-transition specification repair person who works with farmers. In this universe, all code is generated from specs and average humans are making custom software constantly. Domain experts are needed to refine, debug, and in some cases wholesale write the specifications.</p>

<p>There is also a great discussion of the human impact of this post-transition existence. I encourage you to read it, but I’m not addressing that below - not because it isn’t important, but because I want to preserve focus on the “slop generator” drumbeat that feels so misguided.</p>

<p>All in all, I think the piece is well written and that <a href="https://substack.com/@scottwerner">Scott Werner</a> did a great job. This isn’t a critique of the writing or the story itself.  I also don’t know what Scott’s perspective is on LLMs, though their public pages and site lead me to believe they are not anti-generative AI.</p>

<p>I’d been harboring a delusion in the back of my mind about trying to write a story about a “machine whisperer”. Scott’s piece reminded me that I am likely still not a creative writer, and I’m glad for their work here.</p>

<p>My thesis here is simple: this story reads like a set of specification and contract failures. It does not read like evidence that code generation inherently produces “slop” or that opaque code from code generation is inherently a failed concept. To be clear here, this is not a critique of Scott’s view, but instead of the “slop generator” view point.</p>

<h2 id="margaret">Margaret</h2>

<p>Margaret has generated software that pulls in various data sets from both their farm and external sources to predict the best time to harvest. Their latest harvest was harvested before it should have been, and Tom realizes that the specification failed to include a requirement that it raise an error if a data source’s structure or methodology changed. Instead, the system absorbed the data from an updated methodology and didn’t change how it used that data.</p>

<p>This is shown to be a specification problem. The spec as written didn’t suggest that changes were possible or that they should be monitored for, so the generated system didn’t do that.</p>

<p>While this happens with, I suspect, regularity in hand-coded systems, my point isn’t that this is normal. When it happens in a hand-coded system, it is wrong too. And, importantly, it is also a specification error.</p>

<p>There may never have been a specification in the first place and the developer was just expected to figure this out. Depending on their experience and other conditions, they either did … or they didn’t. A clearer spec or set of standards (a/k/a a system prompt) would have fixed this in both cases.</p>

<h3 id="pit-crew">Pit Crew</h3>

<p>Scott introduces pit crews in this anecdote. These are people who monitor ongoing quality and concerns.</p>

<p>Today we often approximate this with monitoring systems that we hope are checking the right things, perhaps even with real end-to-end live tests running on a regular basis. We don’t generally dedicate human teams to it.</p>

<p>Whether we ever hit post-transition or not, this begs for a conversation: is QE/QA solely a pre-ship function, or should we be leveraging that knowledge to monitor delivered software in ways that go deeper than what we typically monitor today?  What does the SRE practice in this space look like?</p>

<p>Framed that way, the pit crew in the story is less a bandage for sloppy generated code and more the missing extension of our specifications and contracts into how we watch systems evolve over time.</p>

<h2 id="ethan">Ethan</h2>

<p>Ethan has generated a multitude of tools and they are all communicating with each other. Ethan is a microservice machine.</p>

<p>Ethan, much like Margaret, has a data feed problem. This time one of his own tools made a change in the methodology and calculated a value per-hundredweight instead of per-head. While not stated in the story, this unit for output was chosen at generation because it wasn’t in the specification and the specification also didn’t have a way (or likely even a requirement) to flag changes. The downstream tool didn’t get a read failure but began using this new data value as though it was still per-head. This resulted in poor market price prediction.</p>

<p>The story is similar to Margaret’s except it is more like when Team A breaks Team B in your own company.</p>

<p>For me it raises the interesting point that while we tend to believe otherwise, in many cases our APIs and data formats are our only true contracts. They operate only at the level where they exist. The internals of our dependencies, or the work of other teams,  are opaque, and you could say that they may “regenerate” their code every day of the week and you just have to hope it still works for your consumption and use. You have to rely on them not breaking the contract and ensure the contract provides the guarantees you need.</p>

<h3 id="choreographer">Choreographer</h3>

<p>A choreographer is a post-transition architect. It is, in my opinion, the thing we should all be if we are going to use LLMs to generate code.</p>

<p>Here a choreographer goes through Ethan’s systems and defines their interface contracts and layers. They also notice that some tools are unnecessary, while others have formed a sub-network that has no effect.  The output of this person’s work is a cleaned up system that functions as a whole and not a set of discrete parts.</p>

<p>This is something we already have to do in large systems, and it’s something that people generating code still have to do. I suspect that some concepts like <a href="https://github.com/steveyegge/gastown">Gastown</a> try to push parts of this work into a different layer of tooling. And it may even work.</p>

<p>LLM generation and reasoning capacity is getting higher, but none of this eliminates the need for this role or for specification correctness.  This is something which we’ve basically never had. Even waterfall failed here.</p>

<p>In this sense, the story reads less like an indictment of generation and more like a warning about what happens when we refuse to name, own, and maintain those contracts across a growing system.</p>

<h2 id="carol">Carol</h2>

<p>Carol’s farm illustrates the ugly mess of things we give automation and then complain about.</p>

<p>In this specific case there is a new irrigation system that is using all of the sensors it has to maintain a 60% moisture level across the farm. This results in under- and over-irrigation in some places because the moisture level in those places is influenced by external factors. The system is doing exactly what it was asked to do. The problem is that the target it was given is a bad fit for the actual farm not that the generated system is inherently bad.</p>

<p>Note: I am not a farmer, so I am taking this example at face value.</p>

<p>The short version is that drainage is funny in some places, other places are getting more wind, and still others need slightly differing levels based on the actual crop in that spot. None of this data has been provided to the system, and the story makes it clear that most of it is not in any system.</p>

<p>The farmer just understands their land and can look at it and tell you what is going to happen based on 30 years of real history and 30 years of experience. This is also not new. This is the art and practice of both coding and system administration, and we have failed to codify it usefully to date. We shouldn’t hold our new system accountable for that, but we also shouldn’t pretend that “just write a better spec” is an easy button when so much of the domain is still tacitly known and not shared beyond tribal means.</p>

<p>This is perhaps the one vignette that gives me pause. Even if we can find code generation (it doesn’t have to be LLMs) that writes to a specification, we may still be unsuccessful when our measurements, abstractions, and language can’t yet capture the thing we actually care about.</p>

<p>Right now we make surgical tweaks to the code to encode these lessons as we learn them. Specifying them in human language is often difficult, and maybe that is the core problem. The boundary here isn’t really “hand-written vs generated code”, it is between where, as technologists, we have experience stating precisely enough and where we don’t have a history of doing that well.</p>

<p>But we work in a precise space. In the case of Carol’s farm, Carol and Tom are able to describe the core problems pretty quickly, and I suspect, given time, could come up with data feeds, additional sensors, or equations that describe the issues sufficiently to fix the irrigation system.</p>

<p>It would be hyper-customized to Carol’s farm, but in many ways that is what she wants and needs - and it’s something we fail to deliver, in general, today. Even here, though, calling the outcome “slop” feels like a category error: the system is faithfully pursuing the narrow, naive target we gave it, not spewing random garbage.</p>

<h2 id="the-real-conversation">The Real Conversation</h2>

<p>I wrote this piece in part because the anti-LLM rhetoric of “they are slop generators” gets under my skin. There are a lot of valid reasons to be anti-LLM today. This is not one.</p>

<p>Reading the story reinforced that for me: what fails in these vignettes are specs, contracts, and incentives, not some inherent “slop” property of generated code. The story isn’t an indictment of generated code, it’s a parable about the timeless need for human wisdom, clear communication, and rigorous oversight, no matter how the code comes to be.</p>

<p>I’d like to see our LLM conversations stick closer to the concrete and demonstrably true. Let’s focus on what these systems do, where they fail, and how our specs and contracts are part of that story, instead of getting pulled into slogans like “slop generator” that, by being false, derail the conversation.  This creates space for us to have the real conversations that matter around ethics, the environment, and training data usage.</p>]]></content><author><name>Brian &quot;bex&quot; Exelbierd</name></author><summary type="html"><![CDATA[Reflecting on “Warranty Void If Regenerated” and why calling LLMs “slop generators” misses the real issues.]]></summary></entry><entry><title type="html">Counting Synology Photos uploads with synofoto-media-count</title><link href="https://bexelbie.com/2026/03/04/synofoto-media-count.html" rel="alternate" type="text/html" title="Counting Synology Photos uploads with synofoto-media-count" /><published>2026-03-04T22:10:00+01:00</published><updated>2026-03-04T22:10:00+01:00</updated><id>https://bexelbie.com/2026/03/04/synofoto-media-count</id><content type="html" xml:base="https://bexelbie.com/2026/03/04/synofoto-media-count.html"><![CDATA[<p>I’m currently testing Synology Photos, including the iPhone uploader. I wanted to know how far the upload had actually gotten.</p>

<p>The problem is that none of the obvious UIs answer that.</p>

<ul>
  <li>The Synology Photos web UI doesn’t show a total count.</li>
  <li>The phone UI shows my whole camera roll (uploaded or not), and also doesn’t give a useful count.</li>
</ul>

<p>So I wrote a small tool: <a href="https://github.com/bexelbie/synofoto-media-count">synofoto-media-count</a>.</p>

<h2 id="the-mismatch">The mismatch</h2>

<p>If you’re backing up an iPhone library, you can end up with three numbers that don’t agree:</p>

<ul>
  <li>The number of photos on your phone</li>
  <li>The number of files on disk</li>
  <li>The number of “things” Synology Photos has indexed (which the UI doesn’t show)</li>
</ul>

<p>That last one is the number I cared about. I’m fine with the file system being messy - I just want to know whether the app has ingested what I think it has.</p>

<h2 id="why-counting-files-doesnt-answer-this">Why counting files doesn’t answer this</h2>

<p>The file system is easy to count, but it’s not what I’m trying to measure. With Live Photos, the file count is expected to be “weird” because a single photo experience can be multiple files.</p>

<p>What I actually want is a number that matches the app’s idea of “items,” because that’s what I’m mentally comparing to the photo count on my phone. That’s the gap this script closes.</p>

<h2 id="what-the-script-does">What the script does</h2>

<p>The repo contains a read-only bash script (<code class="language-plaintext highlighter-rouge">count-media.sh</code>) that runs <code class="language-plaintext highlighter-rouge">SELECT</code> queries against Synology Photos’ PostgreSQL database (by default, the <code class="language-plaintext highlighter-rouge">synofoto</code> database). It has options for multiple users and folders, JSON output for automation, and an optional <code class="language-plaintext highlighter-rouge">publish-to-ha.py</code> helper that publishes counts into Home Assistant via MQTT auto-discovery. It collapses Live Photo pairs into a single “item” so the results are closer to what you see on the phone.</p>

<h2 id="requirements">Requirements</h2>

<p>To run it, you need:</p>

<ul>
  <li>Synology DSM 7.x with Synology Photos installed</li>
  <li>SSH access to the NAS</li>
  <li><code class="language-plaintext highlighter-rouge">sudo</code> privileges (or direct <code class="language-plaintext highlighter-rouge">postgres</code> user access)</li>
  <li>Python 3 if you want to use <code class="language-plaintext highlighter-rouge">publish-to-ha.py</code></li>
</ul>

<h2 id="safety">Safety</h2>

<p>This script is read-only. It runs <code class="language-plaintext highlighter-rouge">SELECT</code> queries only and never modifies your data.</p>

<h2 id="usage-quick">Usage (quick)</h2>

<p>In the common case, you copy the script to your NAS, make it executable, and run it with <code class="language-plaintext highlighter-rouge">sudo</code>. It will try to do something sensible for iPhone uploads (like auto-selecting <code class="language-plaintext highlighter-rouge">/MobileBackup</code> if it exists) and will scope to the current user by default.</p>

<p>If the defaults don’t match your setup, there are flags for selecting a folder interactively, scoping to a different user, and emitting <code class="language-plaintext highlighter-rouge">--json</code> output for automation.</p>

<h2 id="home-assistant-integration">Home Assistant integration</h2>

<p>If you want the counts to show up somewhere other than your terminal, <code class="language-plaintext highlighter-rouge">publish-to-ha.py</code> can publish per-user counts into Home Assistant via MQTT auto-discovery. The result is a handful of sensors per user (non-live photos, Live Photos, videos, other, and a total) that you can graph or use in automations.</p>

<h2 id="notes">Notes</h2>

<ul>
  <li>Counts include nested subfolders by default. If you want a single folder only, there’s an “exact folder” option.</li>
  <li><code class="language-plaintext highlighter-rouge">--verbose</code> shows additional technical detail (raw unit counts, type breakdowns).</li>
  <li><code class="language-plaintext highlighter-rouge">--inspect</code> helps when something looks weird - like “incomplete Live Photo groups” where one half is missing.</li>
  <li>For iPhone MobileBackup libraries, the defaults for photo/video types should work, but they’re overridable if your installation differs.</li>
</ul>

<p>It prints a breakdown like:</p>

<ul>
  <li>non-live photos</li>
  <li>Live Photos (collapsed groups, plus their underlying files)</li>
  <li>standalone videos</li>
  <li>“other” items</li>
  <li>incomplete Live Photo groups (one half is missing)</li>
</ul>

<p>That breakdown is enough to sanity check whether “uploads are incomplete” vs “uploads are likely complete.” This provides a validation point to go with “why isn’t this thing uploading now?”</p>

<p>Now it’s time to set my phone to sleep focus and leave the uploader running overnight … for a long time.</p>]]></content><author><name>Brian &quot;bex&quot; Exelbierd</name></author><summary type="html"><![CDATA[Read-only queries against Synology Photos’ DB to gauge upload progress.]]></summary></entry><entry><title type="html">Jekyll Reads: the tooling behind my reading list</title><link href="https://bexelbie.com/2026/03/03/jekyll-reads.html" rel="alternate" type="text/html" title="Jekyll Reads: the tooling behind my reading list" /><published>2026-03-03T08:50:00+01:00</published><updated>2026-03-03T08:50:00+01:00</updated><id>https://bexelbie.com/2026/03/03/jekyll-reads</id><content type="html" xml:base="https://bexelbie.com/2026/03/03/jekyll-reads.html"><![CDATA[<h2 id="why-i-needed-more-than-a-social-reading-site">Why I needed more than a social reading site</h2>

<p>In <a href="/ramblings/2025/02/11/rediscovering-reading.html">Rediscovering Reading (Without the Social Media Part)</a> I wrote about stepping away from scrolling and building a slower, more deliberate reading habit. Part of that shift was making my reading log public without tying it to a dedicated social network.</p>

<p>The mechanics behind that were simple but fussy: keep a YAML file up to date, copy and paste links from Open Library, remember to grab cover images, and wire everything into Jekyll templates for the reading page and sidebar. None of it was hard, but it was just annoying enough that I knew future‑me would start skipping updates.</p>

<p>I built Jekyll Reads to make that workflow tolerable.</p>

<h2 id="what-jekyll-reads-actually-does">What Jekyll Reads actually does</h2>

<p>Jekyll Reads is a small collection of pieces designed around a single idea: keep all the book data in one <code class="language-plaintext highlighter-rouge">_data/reading.yml</code> file and let everything else be presentation.</p>

<p>The core pieces are:</p>

<ul>
  <li>A shared Node.js library that talks to Open Library, picks a reasonable match, and produces a standard YAML snippet for a book</li>
  <li>A command‑line tool that lets you search for a book and print the YAML to stdout, with options for indentation and auto‑selecting results</li>
  <li>A Vim integration that shells out to the CLI and drops the YAML directly into your buffer at the right indentation level</li>
  <li>A Visual Studio Code extension that does the same thing from inside the editor, with a proper search UI and update checks for the extension itself</li>
</ul>

<p>All of this is intentionally boring: no external Node dependencies, just the built‑in modules and a bit of glue. The point is to make it slightly easier to keep the reading list current than to let it drift.</p>

<h2 id="how-it-shows-up-on-this-site">How it shows up on this site</h2>

<p>On this site, the source of truth is <code class="language-plaintext highlighter-rouge">_data/reading.yml</code>. Entries that are still in progress, finished, or abandoned are all represented there with the same structure. The YAML includes things like start and finish dates, a link to more information (usually Open Library), an optional cover image, and a free‑form comment.</p>

<p>That data feeds two places:</p>

<ul>
  <li>The dedicated <a href="/reading/">reading page</a>, which separates currently‑reading, finished, and abandoned books and shows covers, dates, and comments</li>
  <li>A small sidebar block on the home page that surfaces what I am currently reading, so the log is visible without needing a whole post for every book</li>
</ul>

<p>Jekyll Reads does not try to be a general bookshelf app. It just reflects what I am already doing: writing short notes in YAML and publishing them along with the rest of the site.</p>

<h2 id="design-constraints-and-tradeoffs">Design constraints and trade‑offs</h2>

<p>I made a few deliberate choices that might look odd if you are used to larger toolchains:</p>

<ul>
  <li><strong>No external Node dependencies.</strong> The library and CLI only use built‑in modules like <code class="language-plaintext highlighter-rouge">https</code> and <code class="language-plaintext highlighter-rouge">readline</code>. That keeps installation simple and makes it easy to run in constrained environments.</li>
  <li><strong>Open Library as the primary data source.</strong> It provides book metadata, cover images, and stable URLs without requiring another account or scraping.</li>
  <li><strong>Plain YAML as the storage format.</strong> A static <code class="language-plaintext highlighter-rouge">_data</code> file is easy to version, review, and back up. It also plays nicely with Jekyll’s existing data pipeline.</li>
  <li><strong>Multiple small tools instead of one big one.</strong> The CLI, Vim integration, and VS Code extension all sit on top of the same library, so they stay in sync without each re‑implementing the logic.</li>
</ul>

<p>If any of that stops being true in the future, I can replace or extend the pieces without touching the core data file.</p>

<h2 id="if-you-want-to-use-it">If you want to use it</h2>

<p>The repository README walks through how to set up your own <code class="language-plaintext highlighter-rouge">_data/reading.yml</code>, wire up a reading page and sidebar, and use the CLI or editor integrations. It is written so that you can follow it even if you are not using the same Jekyll theme I am.</p>

<p>The code is MIT‑licensed and shipped under Electric Pliers LLC. If you want a lightweight way to publish a reading log without standing up a whole social network, you might find it useful.</p>

<p>You can find the repository and full documentation here: <a href="https://github.com/ElectricPliers/jekyll-reads">https://github.com/ElectricPliers/jekyll-reads</a></p>]]></content><author><name>Brian &quot;bex&quot; Exelbierd</name></author><summary type="html"><![CDATA[A tiny, dependency-free toolkit for keeping a Jekyll reading log in sync: one YAML data file, a CLI, and editor integrations that handle the boring parts.]]></summary></entry><entry><title type="html">Phone a Friend: Multi-Model Subagents for VS Code Copilot Chat</title><link href="https://bexelbie.com/2026/02/23/phone-a-friend.html" rel="alternate" type="text/html" title="Phone a Friend: Multi-Model Subagents for VS Code Copilot Chat" /><published>2026-02-23T11:10:00+01:00</published><updated>2026-02-23T11:10:00+01:00</updated><id>https://bexelbie.com/2026/02/23/phone-a-friend</id><content type="html" xml:base="https://bexelbie.com/2026/02/23/phone-a-friend.html"><![CDATA[<p>I wanted a way to stay inside Visual Studio Code, use Copilot Chat as the “orchestrator,” and still mix and match models for different parts of the work. Plan a change with one of the slower, more capable models, but let a smaller, faster model handle mechanical refactors. Edit a blog post with one model, but hand Jekyll plumbing or JSON/YAML munging to another. The friction was that the built-in Copilot Chat extension only lets subagents run on the same model as the parent conversation, while the Copilot CLI happily lets you pick any available model per run. Phone a Friend bolts that flexibility onto Copilot Chat, so I can keep the full VS Code experience - including gutter diffs - while dispatching subtasks to whatever model is best for the job.</p>

<h2 id="the-problem">The Problem</h2>

<p>When you use GitHub Copilot Chat in VS Code, every subagent it spawns runs on the same model as the parent conversation. If you’re on Claude Opus 4.6, all subagents are Claude Opus 4.6. Sometimes you want a different model for a subtask - a faster one for simple work, or a different vendor for a second opinion.</p>

<p>GitHub Copilot CLI supports <code class="language-plaintext highlighter-rouge">--model</code> to pick any available model, but using it directly doesn’t help - changes made by the CLI don’t produce VS Code’s gutter indicators (the green/red diff decorations in the editor margin). You get the work done but lose the visual feedback that makes code review comfortable.</p>

<p><a href="https://github.com/bexelbie/phone-a-friend">Phone a Friend</a> is an MCP server that solves both problems. It dispatches work to Copilot CLI with the model of choice, captures a unified diff of the changes, and returns it to the calling agent - which applies it through VS Code’s edit tools. Gutter indicators show up as the changes were made natively.</p>

<h2 id="how-it-works">How It Works</h2>

<ol>
  <li>Copilot Chat calls the <code class="language-plaintext highlighter-rouge">phone_a_friend</code> MCP tool with a prompt, model name, and working directory</li>
  <li>The MCP server creates an isolated git worktree from <code class="language-plaintext highlighter-rouge">HEAD</code></li>
  <li>It launches Copilot CLI in non-interactive mode in that worktree with the requested model</li>
  <li>The subagent does its work and writes its response to a “message-in-a-bottle” file</li>
  <li>The MCP server reads the response, captures a <code class="language-plaintext highlighter-rouge">git diff</code>, and cleans up the worktree</li>
  <li>The MCP server then returns the response text and unified diff to the calling agent</li>
  <li>The calling agent applies the diff using VS Code’s edit tools - gutter indicators appear</li>
</ol>

<p>The “message in a bottle” pattern is worth explaining. Copilot CLI’s stdout mixes the agent’s response with progress output and is unreliable to parse. Rather than fighting noisy output, the tool instructs the subagent to write its final response to a file. The server reads the file. Clean separation.</p>

<h2 id="safety">Safety</h2>

<p>Worktree isolation means your working tree is never modified directly. Push protection blocks <code class="language-plaintext highlighter-rouge">git push</code> at the tool level. Worktrees are cleaned up after every invocation, even on errors.</p>

<h2 id="setup">Setup</h2>

<p>You install Phone a Friend like any other MCP server in VS Code: add the <code class="language-plaintext highlighter-rouge">@bexelbie/phone-a-friend</code> npm package through the <code class="language-plaintext highlighter-rouge">MCP: Add Server...</code> command, or point VS Code at it via your MCP configuration. The GitHub README details the exact JSON and prerequisites (Node.js, Copilot CLI, Git).</p>

<h2 id="usage">Usage</h2>

<p>Once configured, you stay in Copilot Chat and describe the outcome you want; the calling agent decides when to route a subtask through Phone a Friend. The tool surface includes discovery hints, so natural phrasing like “get a second opinion from another model” is usually enough to trigger it. Any model that Copilot CLI exposes is available.</p>

<h2 id="known-limitations">Known Limitations</h2>

<p>A few trade-offs worth knowing:</p>

<ul>
  <li><strong>Context cost.</strong> The unified diff lands in the calling agent’s context window. Large diffs eat context. I’ve got an issue open exploring ideas for improving this.</li>
  <li><strong>Message-in-a-bottle compliance.</strong> Most models follow the instruction to write their final response into the message-in-a-bottle file, but some may occasionally ignore it. When that happens, the calling agent still gets the diff of any file changes but not the response text.</li>
</ul>

<h2 id="availability">Availability</h2>

<p>The project is <a href="https://github.com/bexelbie/phone-a-friend">on GitHub</a> under MIT license, and published on npm as <a href="https://www.npmjs.com/package/@bexelbie/phone-a-friend"><code class="language-plaintext highlighter-rouge">@bexelbie/phone-a-friend</code></a>. Written in TypeScript.</p>

<h2 id="what-changed-for-me">What Changed For Me</h2>

<p>Since integrating this into my Copilot setup, the biggest shift is that I no longer have to choose between “the model I want to think with” and “the model I want to do the work” and I eliminated a bunch of copy/paste from manually emulating this. I keep the main conversation with a larger, more capable model for planning and review, and routinely:</p>

<ul>
  <li>send quick, mechanical refactors to a smaller, faster model</li>
  <li>hand Jekyll front matter, Liquid, and config tweaks to a model that’s better at markup and templating</li>
  <li>ask a different vendor’s model for a second opinion on changes or ideas, especially where that model may be better at the task</li>
</ul>

<p>Because everything still lands back in the same VS Code buffer with normal gutter diffs, it feels like one coherent tool instead of a handful of loosely-connected ones.</p>

<p>The project also had an unexpected dynamic in the development process. Building an MCP server that mimics a capability already available to the model created a strange feedback loop. I could collaborate on the implementation with Opus, and then turn around and interview it as a subject matter expert on how it uses that very same capability. It was a weird feeling to use the model as both a partner in writing the code and a primary source for understanding the user requirements.</p>]]></content><author><name>Brian &quot;bex&quot; Exelbierd</name></author><summary type="html"><![CDATA[Dispatch subtasks to a different AI model - with editor gutter indicators intact.]]></summary></entry><entry><title type="html">Replacing my compact calendar spreadsheet with an ICS-powered web app</title><link href="https://bexelbie.com/2026/02/18/online-compact-calendar.html" rel="alternate" type="text/html" title="Replacing my compact calendar spreadsheet with an ICS-powered web app" /><published>2026-02-18T14:30:00+01:00</published><updated>2026-02-18T14:30:00+01:00</updated><id>https://bexelbie.com/2026/02/18/online-compact-calendar</id><content type="html" xml:base="https://bexelbie.com/2026/02/18/online-compact-calendar.html"><![CDATA[<p>I’ve used some form of <a href="https://davidseah.com/node/compact-calendar/">DSri Seah’s Compact Calendar</a> for over seven years. The calendar is a lovingly designed single-page view of the entire year, organized into Monday-through-Sunday weeks with no breaks between months.</p>

<p>The point of the format is simple: my normal calendar is great at telling me what I’m doing on Tuesday. What it’s terrible at is answering planning questions that are above the day level, such as:</p>

<ul>
  <li>If we take a vacation the last two weeks of July, will it overlap business travel?</li>
  <li>Can we connect these two public holidays and get 14 days away for only 8 days of PTO?</li>
  <li>Do we have any genuinely empty weeks left this year?</li>
</ul>

<p>For a long time, my compact calendar was a spreadsheet. That worked until it didn’t.</p>

<h2 id="the-problem-i-actually-needed-to-solve">The problem I actually needed to solve</h2>

<p>The spreadsheet version served me well for years, but life got more complicated.</p>

<p>My kid is getting older, which means more activities to track: summer camps, school breaks, etc. My partner and I no longer work for the same company, so we don’t share the same corporate holidays and as our roles have changed so has the amount of travel we do. And, honestly, my spreadsheet has bespoke formulas that only I understand … on Thursdays when there is a full moon.</p>

<p>My partner knows how to use a calendar app. She really doesn’t want to learn a special spreadsheet for planning, and I don’t blame her.</p>

<p>The real friction screaming out that there had to be a better way was the double-entry work. If my kid has summer camp in July, I’d put it on the family calendar - and then manually mark those weeks on my compact calendar spreadsheet. Two sources of truth means one of them is eventually wrong.</p>

<p>So the job wasn’t “build a better calendar.” It was: keep the year-at-a-glance view, but make the calendar app the source of truth.</p>

<h2 id="the-shape-of-the-solution">The shape of the solution</h2>

<p>I decided to build a web version of the compact calendar that could read directly from standard ICS calendar feeds.</p>

<p>Put the summer camp on the shared calendar once. The compact calendar picks it up automatically.</p>

<p>And if this was going to be something my partner and I actually used together, it needed two things:</p>

<ul>
  <li>A simple setup flow (not “copy this spreadsheet and don’t touch column Q”)</li>
  <li>A way to always be available beyond, “go find this Google Docs Link”</li>
</ul>

<h2 id="what-the-tool-does">What the tool does</h2>

<p>The calendar renders a full year on a single page. Each row is one week, Monday through Sunday.</p>

<p>Parallel to the block of weeks running down the page is a column for displaying committed events and a second for displaying possible events.</p>

<ul>
  <li>Committed: events that are definitely happening - travel that’s booked, school terms, confirmed work trips.</li>
  <li>Possible: things under consideration - a conference I submitted a talk to but haven’t heard back from yet, vacation options we’re weighing.</li>
</ul>

<p>The tool uses color to signal status at a glance:</p>

<ul>
  <li>Blue background: first day of the month (anchors the continuous weeks)</li>
  <li>Red text: public holidays (per selected country)</li>
  <li>Green background: committed events</li>
  <li>Yellow background: possible events</li>
  <li>Green background with a yellow border: overlaps/conflicts that need attention</li>
</ul>

<p>Here’s what the full-year view looks like with demo data loaded:</p>

<p><img src="/img/2026/CC-FullCalendar.png" alt="A full-year compact calendar view with one row per week (Monday through Sunday), with committed events shown in green, possible events in yellow, public holidays in red, and overlaps highlighted with a yellow border." /></p>

<h2 id="inputs-url-file-or-demo">Inputs: URL, file, or demo</h2>

<p>While there is demo data available in the system, the key comes from loading your own data. You can choose two different kinds of sources:</p>

<ul>
  <li>A URL - a <code class="language-plaintext highlighter-rouge">webcal://</code> or <code class="language-plaintext highlighter-rouge">https://</code> link to a published calendar (iCloud, Google Calendar, etc.)</li>
  <li>A file - a <code class="language-plaintext highlighter-rouge">.ics</code> file uploaded from your computer</li>
</ul>

<p>We’re an Apple household so our calendars live in iCloud, but the tool doesn’t care about your calendar provider. Anything that produces a standard ICS feed works.</p>

<p>My practical workflow is two shared calendars in Apple Calendar:</p>

<ul>
  <li>one for committed travel and events.  For me, this is actually my shared calendar that our family maintains.</li>
  <li>one for possibilities we’re considering</li>
</ul>

<p>Both are published as <code class="language-plaintext highlighter-rouge">webcal</code> URLs, and the compact calendar fetches them and renders the year view. Using my shared calendar works because the app ignores events that aren’t multi-day, all-day blocks - so dentist appointments don’t drown out the year view.  You can optionally include single day all-day events if that helps you.</p>

<p>The setup controls are intentionally simple:</p>

<p><img src="/img/2026/CC-Controls.png" alt="Configuration controls showing a country dropdown (for public holidays) and two inputs for selecting the committed and possible calendar sources." /></p>

<h2 id="the-tech-and-the-annoying-part">The tech (and the annoying part)</h2>

<p>This is a vanilla JavaScript app built with <a href="https://vite.dev/">Vite</a>, hosted on <a href="https://azure.microsoft.com/en-us/products/app-service/static">Azure Static Web Apps</a>. No framework - just DOM manipulation, a CSS file, and under 500 lines of main application code.</p>

<p>The interesting technical problem was CORS.</p>

<p>Calendar providers like iCloud don’t set CORS headers on their published feeds, which means a browser can’t fetch them directly. The solution is a small Azure Function that acts as a proxy:</p>

<ul>
  <li>the browser sends the calendar URL to the server</li>
  <li>the server fetches the calendar data</li>
  <li>the server returns it to the browser</li>
</ul>

<p>The proxy doesn’t store or log anything. It’s a pass-through.</p>

<p>I built the app with an AI coding agent. I provided direction and made decisions, but I didn’t hand-write every line. For this kind of tool, I’m comfortable with that. It’s a static site that renders calendar data client-side, and the risk profile is low. Additionally, nothing in this code represents a new problem or a novelty. This is bog-standard code, and the agent handled the boilerplate well for this project.</p>

<p>Importantly, even though I could have written this code myself, I wouldn’t have. I probably would have gotten myself caught in a bit of analysis paralysis over frameworks. But more importantly, writing a lot of this code is just boring code to write. The AI agent has allowed me to solve my own problem, and that’s the part that matters to me. I didn’t have to suddenly become more disciplined about spreadsheets or get my family dragged onto a tool that really only speaks to me. Instead, I was able to change the shape of the problem and make it more solvable within the context of the humans involved.</p>

<h2 id="privacy-and-the-honest-trade-off">Privacy and the honest trade-off</h2>

<p>All your data stays in your browser. The app stores the URLs you’re loading, your selected country, and cached holiday data in local storage. This is purely functional and not for tracking.</p>

<p>Calendar URLs necessarily have to go through the server-side proxy because browsers won’t fetch them directly. The proxy is a stateless pass-through — I don’t persist calendar data in the function or in your browser. Calendar URLs are sent via POST request body rather than query parameters, which means they aren’t captured in Azure’s platform-level request logs. Error logging includes only the target hostname (e.g., “iCloud fetch failed”), never the full URL or authentication tokens. If your calendar URL contains authentication tokens (iCloud URLs do), understand that the proxy briefly sees them in transit.</p>

<h2 id="try-it-out">Try it out</h2>

<p>The calendar is live at <a href="https://cc.bexelbie.com">cc.bexelbie.com</a>. You can load the built-in demo data to explore without connecting your own calendars - select “Demo” from either input dropdown.</p>

<p>The source is on GitHub at <a href="https://github.com/bexelbie/online-compact-calendar">bexelbie/online-compact-calendar</a>. If you have ideas or find bugs, <a href="https://github.com/bexelbie/online-compact-calendar/issues">open an issue</a>.</p>

<p>On first visit, there’s a banner that points you at settings:</p>

<p><img src="/img/2026/CC-first-run-banner.png" alt="A first-run welcome banner that tells the user to use the gear icon to configure the app." /></p>

<h2 id="whats-next">What’s next</h2>

<p>I’m going to live with it for a while before adding features. The spreadsheet served me for seven years with almost no changes.</p>]]></content><author><name>Brian &quot;bex&quot; Exelbierd</name></author><summary type="html"><![CDATA[I rebuilt my year-at-a-glance compact calendar as a small web app that reads ICS feeds and highlights conflicts.]]></summary></entry><entry><title type="html">Building a tiny ephemeral draft sharing system on Hedgedoc</title><link href="https://bexelbie.com/2026/02/12/yak-shaving.html" rel="alternate" type="text/html" title="Building a tiny ephemeral draft sharing system on Hedgedoc" /><published>2026-02-12T13:00:00+01:00</published><updated>2026-02-12T13:00:00+01:00</updated><id>https://bexelbie.com/2026/02/12/yak-shaving</id><content type="html" xml:base="https://bexelbie.com/2026/02/12/yak-shaving.html"><![CDATA[<blockquote>
  <p>This yak is now shaved!</p>

  <p><cite>me</cite></p>
</blockquote>

<p>I’ve been working on two submissions I want to put into the CFP for <a href="https://installfest.cz">installfest.cz</a> and had them at a “man it’d be nice to have someone else read and comment on this” level of done.  Normally when this happens I have to psych myself up for it, both because receiving feedback can be hard and because I have to do a format conversion.  I tend to write in markdown in “all the places” and sharing a document for edits has typically meant pasting it into something like Google Docs or Office 365, where even if it still looks like markdown … it isn’t.</p>

<p>And that’s when the yak walked into the room. Instead of just pasting my drafts into Google Docs and getting on with the reviews, I decided I needed to delay getting feedback and build the markdown collaborative editing system of my dreams. Classic yak shaving - solving a problem you don’t actually need to solve in order to eventually do the thing you originally set out to do. <a href="https://www.youtube.com/watch?v=0E5ae4MD5qo">What is Yak Shaving</a> - a video by Matthew Miller if you’re unfamiliar.</p>

<p>When I am done, I then have to take this text back to where it was originally going, often in good clean markdown (this blog post is in markdown!).  This rigmarole is tiring.  I also dislike that the go to tools for this for me had turned into an exercise in ensuring guests could access a document or collecting someone’s login ids to yet another system.</p>

<p>I knew there had to be a better way.  Then it hit me.  When markdown started to take off we had a slew of markdown collaborative editing sites take off.  They were often modeled on the older etherpad.  Well, several are still around.  I looked at online options as I tend to prefer using a service when I can so I don’t get more sysadmin work to do.</p>

<p>I hit three snags in picking one:</p>

<ol>
  <li>I don’t like being on a free tier when I don’t understand how it is supported.  While I don’t know that anyone in this space is nefarious, the world is trending in a specific direction.  I don’t mind paying, but this was also not going to generate enough value to warrant serious payments.</li>
  <li>The project that first came to mind for markdown collaboration went open core back in 2019.  Open source business models are hard, and doing open core well is even harder.  As you’ll see below I had specific needs and I had a feeling I might run into the open core wall.</li>
  <li>One of the CFPs would actually benefit from implementing this as my example … bonus!</li>
</ol>

<p>After examining a bunch of options, I settled on building something out of <a href="https://hedgedoc.org">Hedgedoc</a>. This was not an easy choice and the likelihood of entering analysis paralysis was super high.  So I decided to try to force this to fit on a free tier Google GCP instance I have been running for years.  It is the tiny e2-micro burstable instance, a literal thimble of compute.</p>

<p>This ran off a lot of options.  Privacy first options need more compute just to do encryption work.  A bunch of options want a server database (Postgres and friends) and a single person instance should be fine on SQLite, in my opinion.  All roads now ran to Hedgedoc.  It was the only option that could run on SQLite, tolerate my tiny VM, still give me collaborative markdown, and seemed to have every feature required if I could make it work.</p>

<p>It wasn’t all sunshine and happiness though.  Hedgedoc is in the middle of writing version 2.0, which means 1.0 is frozen for anything except critical fixes and all efforts are focused on the future.  Therefore, the documentation being a bit rough in places was something I was going to have to live with.</p>

<p>My core requirements were:</p>
<ol>
  <li>Only I am allowed to create new notes</li>
  <li>Anyone with the “unguessable” url can edit and should not require an account to do so</li>
  <li>This should require next to zero system administration work and be easy to start and stop</li>
  <li>When I need more features, I should be able to extend this with a plugin for tools like <a href="https://obsidian.md">Obsidian</a> or Visual Studio Code.</li>
</ol>

<p>And while it took longer than I’d hoped, it works.  Here’s how:</p>

<ol>
  <li>Write yourself a configuration file for Hedgedoc</li>
</ol>

<p>config.json:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>{
  "production": {
    "sourceURL": "https://github.com/bexelbie/hedgedoc",
    "domain": "&lt;url&gt;",
    "host": "localhost",
    "protocolUseSSL": true,
    "loglevel": "info",
    "db": {
      "dialect": "sqlite",
      "storage": "/data/db/hedgedoc.sqlite"
    },
    "email": true,
    "allowEmailRegister": false,
    "allowAnonymous": false,
    "allowAnonymousEdits": true,
    "requireFreeURLAuthentication": true,
    "disableNoteCreation": false,
    "allowFreeURL": false,
    "enableStatsApi": false,
    "defaultPermission": "limited",
    "imageUploadType": "filesystem",
    "hsts": {
      "enable": true,
      "maxAgeSeconds": 31536000,
      "includeSubdomains": true,
      "preload": true
    }
  }
}
</code></pre></div></div>

<p>This sets a custom source URL for the fork I have made (more below), enables SSL, disables new account registration, and allows edits via unguessable URLs without requiring logins.</p>

<ol>
  <li>Decide how you want to launch the container, I am using a quadlet, and provide some environment variables:</li>
</ol>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>CMD_SESSION_SECRET="&lt;secret&gt;"
CMD_CONFIG_FILE=/hedgedoc/config.json
NODE_ENV=production
</code></pre></div></div>

<p>These just put it in Production mode, point it at the config and provide the only secret required.</p>

<ol>
  <li>You’re basically done.  I happen to have put mine behind a Cloudflare tunnel and updated the main page of the site, but those are pretty straight forward.</li>
</ol>

<h2 id="more-yak-shaving">More Yak Shaving</h2>

<p>Naturally I planned to launch it, create my user id via the cli, and share my CFP submissions with the folks I wanted reviews from.
Narrator: Naturally, that’s not what happened.</p>

<p>I decided to push YAGNI<sup id="fnref:1"><a href="#fn:1" class="footnote" rel="footnote" role="doc-noteref">1</a></sup> out of the way and NEED IT!  Specifically I forked the v1 code into <a href="https://github.com/bexelbie/hedgedoc/">a repository</a> to add some features.  The upstream is unlikely to want any of these so I will have to carry these patches.  What I did:</p>

<ol>
  <li>Hedgedoc will do color highlighting and gutter indicators so you can see which author added what text.  Unfortunately it wasn’t seeming to be working.  I was getting weak indicators (underlines instead of highlighting) and often nothing.  So I fixed that.</li>
  <li>The colors for authorship are chosen randomly.  I am a bit past my prime in the seeing department and it was hard to see the colors against the dark editor background, so I restricted color choices to those that are contrasting.  It isn’t perfect, but it is better.</li>
  <li>My particular set up involves a lot of guest editors.  Normally I share to just a few folks, but sometimes to many.  They’ll all be anonymous.  Hedgedoc doesn’t track authorship colors for guests, so I patched in a system to generate color markings for anonymous editors.</li>
  <li>A feature I always loved in Etherpad was that you could temporarily hide the authorship colors when you just wanted to “read the document.”  So I added a button for that.  While I was doing that I discovered that there is a separate toggle to switch the editor into light mode, but I couldn’t see it because the status bar was black and it was set to .2 opacity!! I fixed that too.  Also, now the status bar switches when the editor switches.</li>
  <li>Comments, it turns out are needed.  So I coded in rudimentary support for critic markup comments.</li>
</ol>

<p>I have other ideas, but instead I am going to stop and let YAGNI win for a while.  Besides, hopefully 2.0 will ship soon and render all of this unneeded.</p>

<p>So there you go, now if you want to offer your assistance to help me write something, I’ll send you a link and you can go to town on our shared work.  If you want to see more about this, well, let’s see if Installfest.cz thinks you should or not :D — and whether this yak decides to grow its hair back.</p>

<div class="footnotes" role="doc-endnotes">
  <ol>
    <li id="fn:1">
      <p>YAGNI: You Ain’t Gonna Need It - a philosophy that reminds us that features we dream up aren’t needed until an actual use comes along (or a paying customer).  This also applies to engineering for future ideas when those ideas aren’t committed too yet. <a href="#fnref:1" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
  </ol>
</div>]]></content><author><name>Brian &quot;bex&quot; Exelbierd</name></author><summary type="html"><![CDATA[How I'm using Hedgedoc on a tiny VM to share markdown drafts for feedback without heavyweight doc tools.]]></summary></entry><entry><title type="html">op-secret-manager: A SUID Tool for Secret Distribution</title><link href="https://bexelbie.com/2026/02/06/op-secret-manager.html" rel="alternate" type="text/html" title="op-secret-manager: A SUID Tool for Secret Distribution" /><published>2026-02-06T12:40:00+01:00</published><updated>2026-02-06T12:40:00+01:00</updated><id>https://bexelbie.com/2026/02/06/op-secret-manager</id><content type="html" xml:base="https://bexelbie.com/2026/02/06/op-secret-manager.html"><![CDATA[<p>Getting secrets from 1Password to applications running on Linux keeps forcing a choice I don’t want to make. Manual retrieval works until you get more than a couple of things … then you need something more. There are lots of options, but they all felt awkward or heavy, so I wrote <a href="https://github.com/bexelbie/op-secret-manager"><code class="language-plaintext highlighter-rouge">op-secret-manager</code></a> to fill the gap: a single-binary tool that fetches secrets from 1Password and writes them to per-user directories. No daemon, no persistent state, no ceremony.</p>

<h2 id="the-problem-secret-zero-on-multi-user-systems">The Problem: Secret Zero on Multi-User Systems</h2>

<p>The “secret zero” problem is fundamental: you need a first credential to unlock everything else. On a multi-user Linux system, this creates friction. Different users (application accounts like <code class="language-plaintext highlighter-rouge">postgres</code>, <code class="language-plaintext highlighter-rouge">redis</code>, or human operators) need different secrets. You want to centralize management (1Password) but local distribution without exposing credentials across user boundaries. You also don’t want to solve the “secret zero” problem multiple times or have a bunch of first credentials saved in random places all over the disk.</p>

<p>Existing approaches each carry costs:</p>

<ul>
  <li><strong>Manual copying</strong>: Unscalable and leaves secret material in shell history or temporary files.</li>
  <li><strong>1Password CLI directly</strong>: Requires each user to authenticate or have API key access, which recreates the distribution problem and litters the disk with API keys.</li>
  <li><strong>Persistent agents</strong> (Connect, Vault): Add services to monitor, restart policies to configure, and failure modes to handle.</li>
  <li><strong>Cloud provider integrations</strong>: Generally unavailable on bare metal or hybrid environments where half your infrastructure isn’t in AWS/Azure/GCP.</li>
</ul>

<p>What I wanted: the <code class="language-plaintext highlighter-rouge">postgres</code> user runs a command, secrets appear in <code class="language-plaintext highlighter-rouge">/run/user/1001/secrets/</code>, done.</p>

<h2 id="how-it-works">How It Works</h2>

<p>The tool uses a mapfile to define which secrets go where:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>postgres   op://vault/db/password         db_password
postgres   op://vault/db/connection       connection_string
redis      op://vault/redis/auth          redis_password
</code></pre></div></div>

<p>Each line maps a username, a 1Password secret reference, and an output path. Relative paths expand to <code class="language-plaintext highlighter-rouge">/run/user/&lt;uid&gt;/secrets/</code>. Absolute paths work if the user has write permission.</p>

<p>The “secret zero” challenge is now centralized through the use of a single API key file that all users can access. But the API key needs protection from unprivileged reads and ideally from the users themselves. This is where SUID comes in … carefully.</p>

<h2 id="privilege-separation-design">Privilege Separation Design</h2>

<p>The security model uses SUID elevation to a service account (not root), reads protected configuration, then immediately drops privileges before touching the network or filesystem.</p>

<p>This has not been independently security audited. Treat it as you would any custom SUID program: read the source, understand the threat model, and test it in your environment before deploying broadly.</p>

<p>The flow:</p>

<ol>
  <li>Binary is SUID+SGID to <code class="language-plaintext highlighter-rouge">op:op</code> (an unprivileged service account)</li>
  <li>Process starts with elevated privileges, reads:
    <ul>
      <li>API key from <code class="language-plaintext highlighter-rouge">/etc/op-secret-manager/api</code> (mode 600, owned by <code class="language-plaintext highlighter-rouge">op</code>)</li>
      <li>Mapfile from <code class="language-plaintext highlighter-rouge">/etc/op-secret-manager/mapfile</code> (typically mode 640, owned by <code class="language-plaintext highlighter-rouge">op:op</code> or <code class="language-plaintext highlighter-rouge">root:op</code>)</li>
    </ul>
  </li>
  <li>Drops all privileges to the real calling user</li>
  <li>Validates that the calling user appears in the mapfile</li>
  <li>Fetches secrets from 1Password</li>
  <li>Writes secrets as the real user to <code class="language-plaintext highlighter-rouge">/run/user/&lt;uid&gt;/secrets/</code></li>
</ol>

<p>Because the network calls and writes happen <em>after</em> the privilege drop, the filesystem automatically enforces isolation. User <code class="language-plaintext highlighter-rouge">postgres</code> cannot write to <code class="language-plaintext highlighter-rouge">redis</code>’s directory. The secrets land with the correct ownership without additional chown operations.</p>

<h3 id="why-suid-to-a-service-account">Why SUID to a Service Account?</h3>

<p>Elevating to root would be excessive. Elevating to a dedicated, unprivileged service account constrains the blast radius. If someone compromises the binary, they get the privileges of <code class="language-plaintext highlighter-rouge">op</code> (which can read one API key) rather than full system access.</p>

<p>Alternatives considered:</p>

<ul>
  <li><strong>Linux capabilities</strong> (<code class="language-plaintext highlighter-rouge">CAP_DAC_READ_SEARCH</code>): Still requires root ownership of the binary to assign capabilities, which increases risk.</li>
  <li><strong>Group-readable API key</strong>: Forces all users into a shared group, allowing direct API key reads. This moves the problem rather than solving it.</li>
  <li><strong>No privilege separation</strong>: Each user needs a copy of the API key, defeating centralized management.</li>
</ul>

<p>The mapfile provides access control: it defines which users can request which secrets. The filesystem enforces it: even if you bypass the mapfile check, you can’t write to another user’s runtime directory. While you would theoretically be able to harvest a secret, you won’t be able to modify what the other user uses. This is key because a secret may not actually be “secret.” I have found it useful to centralize some configuration management, like API endpoint addresses, with this tool.</p>

<h3 id="root-execution">Root Execution</h3>

<p>Allowing root to use the tool required special handling. The risk is mapfile poisoning: an attacker modifies the mapfile to make root write secrets to dangerous locations.</p>

<p>The mitigation: root execution is only permitted if the mapfile is owned by <code class="language-plaintext highlighter-rouge">root:op</code> with no group or world write bits. If you can create a root-owned, properly-permissioned file, you already have root access and don’t need this tool for privilege escalation.  The SGID bit on the binary lets the service account, <code class="language-plaintext highlighter-rouge">op</code>, read the mapfile even though it is owned by root.</p>

<h2 id="practical-integration-podman-quadlets">Practical Integration: Podman Quadlets</h2>

<p>My primary use case is systemd-managed containers. Podman Quadlets make this concise. This example is of a rootless <em>user</em> Quadlet (managed via <code class="language-plaintext highlighter-rouge">systemctl --user</code>), not a system service.</p>

<div class="language-ini highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nn">[Unit]</span><span class="w">
</span><span class="py">Description</span><span class="p">=</span><span class="s">Application Container</span>
<span class="py">After</span><span class="p">=</span><span class="s">network-online.target</span>
<span class="w">
</span><span class="nn">[Container]</span><span class="w">
</span><span class="py">Image</span><span class="p">=</span><span class="s">docker.io/myapp:latest</span>
<span class="py">Volume</span><span class="p">=</span><span class="s">/run/user/%U/secrets:/run/secrets:ro,Z</span>
<span class="py">Environment</span><span class="p">=</span><span class="s">DB_PASSWORD_FILE=/run/secrets/db_password</span>
<span class="py">ExecStartPre</span><span class="p">=</span><span class="s">/usr/local/bin/op-secret-manager</span>
<span class="py">ExecStopPost</span><span class="p">=</span><span class="s">/usr/local/bin/op-secret-manager --cleanup</span>
<span class="w">
</span><span class="nn">[Service]</span><span class="w">
</span><span class="py">Restart</span><span class="p">=</span><span class="s">always</span>
<span class="w">
</span><span class="nn">[Install]</span><span class="w">
</span><span class="py">WantedBy</span><span class="p">=</span><span class="s">default.target</span>
</code></pre></div></div>

<p><code class="language-plaintext highlighter-rouge">ExecStartPre</code> fetches secrets before the container starts. The container sees them at <code class="language-plaintext highlighter-rouge">/run/secrets/</code> (read-only). <code class="language-plaintext highlighter-rouge">ExecStopPost</code> removes them on shutdown. The application reads secrets from files (not environment variables), avoiding the “secrets in env” problem where <code class="language-plaintext highlighter-rouge">env</code> or a log dump leaks credentials.</p>

<p>The secrets directory is a <code class="language-plaintext highlighter-rouge">tmpfs</code> (memory-backed <code class="language-plaintext highlighter-rouge">/run</code>), so nothing touches disk. If lingering is enabled for the user (<code class="language-plaintext highlighter-rouge">loginctl enable-linger</code>), the directory persists across logins.</p>

<h2 id="trade-offs-and-constraints">Trade-offs and Constraints</h2>

<p>This design makes specific compromises for simplicity:</p>

<p><strong>No automatic rotation.</strong> The tool runs, fetches, writes, exits. If a secret changes in 1Password, you need to re-run the tool (or restart the service). For scenarios requiring frequent rotation, a persistent agent might be better. For most use cases, rotation happens infrequently enough that ExecReload or a manual re-fetch works fine.</p>

<p><strong>Filesystem permissions are the security boundary.</strong> If an attacker bypasses Unix file permissions (kernel exploit, root compromise), the API key is exposed. This is consistent with how <code class="language-plaintext highlighter-rouge">/etc/shadow</code> or SSH host keys are protected. File permissions are the Unix-standard mechanism. Encrypting the API key on disk would require storing the decryption key somewhere accessible to the SUID binary, recreating the same problem with added complexity.</p>

<p><strong>Scope managed by 1Password service account.</strong> The shared API key is the critical boundary. If it’s compromised, every secret it can access is exposed. Proper 1Password service account scoping (separate vaults, least-privilege grants, regular audits) is essential.</p>

<p><strong>Mapfile poisoning risk for non-root.</strong> If an attacker can modify the mapfile, they can make users write secrets to unintended locations. This is mitigated by restrictive mapfile permissions (typically <code class="language-plaintext highlighter-rouge">root:op</code> with mode 640). The filesystem still prevents writes to directories the user doesn’t own, but absolute paths could overwrite user-owned files.</p>

<p><strong>No cross-machine coordination.</strong> This is a single-host tool. Distributing secrets to a cluster requires running the tool on each node or using a different solution.</p>

<h2 id="implementation-details-worth-noting">Implementation Details Worth Noting</h2>

<p>The Go implementation uses the 1Password SDK rather than shelling out to <code class="language-plaintext highlighter-rouge">op</code> CLI. This avoids parsing CLI output and handles authentication internally.</p>

<p>Path sanitization prevents directory traversal (<code class="language-plaintext highlighter-rouge">..</code> is rejected). Absolute paths are allowed but subject to the user’s own filesystem permissions after privilege drop.</p>

<p>The cleanup mode (<code class="language-plaintext highlighter-rouge">--cleanup</code>) removes files based on the mapfile. It only deletes files, not directories, and only if they match entries for the current user. This prevents accidental removal of shared directories.</p>

<p>A verbose flag (<code class="language-plaintext highlighter-rouge">-v</code>) exists primarily for debugging integration issues. Most production usage doesn’t need it.</p>

<h2 id="availability">Availability</h2>

<p>The project is <a href="https://github.com/bexelbie/op-secret-manager">on GitHub</a> under GPLv3. Pre-built binaries for Linux amd64 and arm64 are available in releases.</p>

<p>This isn’t the right tool for every scenario. If you need dynamic rotation, audit trails beyond what 1Password provides, or distributed coordination, look at Vault or a cloud provider’s secret manager. If you’re running Kubernetes, use native secret integration.</p>

<p>But for the specific case of “I have a few Linux boxes, some containers, and a 1Password account; I want secrets distributed without adding persistent infrastructure,” this does the job.</p>]]></content><author><name>Brian &quot;bex&quot; Exelbierd</name></author><summary type="html"><![CDATA[A no-daemon tool that distributes 1Password secrets to multi-user Linux systems but retains centralized management.]]></summary></entry><entry><title type="html">On EU Open Source Procurement: A Layered Approach</title><link href="https://bexelbie.com/2026/01/27/eu-open-source-procurement.html" rel="alternate" type="text/html" title="On EU Open Source Procurement: A Layered Approach" /><published>2026-01-27T09:10:00+01:00</published><updated>2026-01-27T09:10:00+01:00</updated><id>https://bexelbie.com/2026/01/27/eu-open-source-procurement</id><content type="html" xml:base="https://bexelbie.com/2026/01/27/eu-open-source-procurement.html"><![CDATA[<p>Disclaimer: I work at Microsoft on upstream Linux in Azure. These are my personal notes and opinions.</p>

<p>The European Commission has launched a consultation on the EU’s future Open Source strategy. That combined with some comments by <a href="https://toot.io/@jzb@hachyderm.io">Joe Brockmeier</a> made me think about this from a procurement perspective.  Here’s the core of my thinking: treat open source as recurring OpEx, not a box product. That means hiring contributors, contracting external experts, and funding internal IT so the EU participates rather than only purchases.</p>

<p>A lot of reaction to this request has shown up in the form of suggestions for the EU to fund open source software companies and to pay maintainers. In this <a href="https://toot.io/@jzb@hachyderm.io/115939723318453222">Mastodon exchange</a> that I had with Joe, he points out that these comments ignore the realities of how procurement works and the processes that vendors go through that, if followed by maintainers, would be both onerous and leave them in the precarious position of living contract to contract.</p>

<p>His prescription is that that the EU should participate in communities by literally “rolling up [their] sleeves and getting directly involved.” My reaction was to point out that doing these things has an indirect, at best, relationship to bottom-line metrics (profit, efficiency, cost, etc.) and that our government structures are not set up to reward this kind of thinking. In general people want to see their governments not be “wasteful” in a context where one person’s waste is another’s necessity.</p>

<p>As the exchange continued, <a href="https://toot.io/@jzb@hachyderm.io/115940797583976313">Joe pointed out</a> that “it’s not FOSS that needs to change, it’s the organizational thinking.”</p>

<p>In the moment I took the conversation in a slightly different direction, but the core of this conversation stuck with me. I woke up this morning thinking about organizational change. I am sure I am not the first to think this way, but here’s my articulation.</p>

<p>An underlying commentary, in my opinion, in many of the responses from the “pay the maintainers / fund open source” crowd is the application of a litmus test to the funded parties. Typically they want to exclude not only all forms of proprietary software, but also SAAS products that don’t fully open their infrastructure management, products which rely on cloud services, large companies, companies that have traditionally been open source friendly that have been acquired (even if they are still open source friendly), and so on. These exclusions, no matter which you support, if any, tend to drive the use of open source software by an entity like the EU into a 100% self-executed motion. And, despite the presence of SAAS in that list, these conversations often treat open source software as a “box product” only experience that the end-user must self install in their own (private and presumably all open source) cloud.</p>

<p>A key element of most entities is that they procure the things that aren’t uniquely their effort. A government procures email server software (and increasingly email as a service) because sending email isn’t their unique effort; the work that email allows to happen is. There is an inherent disconnect between the effort and therefore the corresponding cost expectation of getting email working so you can do work versus first becoming an email solution provider and expert and then after that beginning to do the work you wanted to do. (A form of Yak Shaving perhaps?).</p>

<p>While I am not sure I will reply to the EU Commission - I am a resident of the EU but not an EU citizen - I wanted to write to organize my thoughts.</p>

<h2 id="why-procurement-struggles-with-oss">Why Procurement Struggles With OSS</h2>

<p>Software procurement is effectively the art of getting software:</p>

<ul>
  <li>written</li>
  <li>packaged into a distributable consumable</li>
  <li>maintained</li>
  <li>advanced with new features as need arises</li>
  <li>installed and working</li>
</ul>

<p>Over time the industry has become more adept at doing more of these things for their customers. Early software was all custom and then we got software that was reusable. Software companies became more common as entities became willing to pay for standardized solutions and we saw the rise of the “box product.” SaaS has supplanted much of the installation and execution last-mile work that was the traditional effort of in-house IT departments. From an organizational perspective, these distinct areas of cost - some one-time and some recurring - have increasingly been rolled into a single, recurring cost. That is easier to budget and operate.</p>

<p>Bundling usually leads to discounting. Proprietary software companies control this whole stack and therefore can capture margin at multiple layers. This also allows them to create a discount when bundling different layers because they can “rationalize” their layer-based profit calculations. Open source changes this equation. There is effectively no profit built into most layers because any profit-taking is competed away in a deliberate and wanted race to the bottom. When a company commercializes open source software, it has to build all of its profit (and the cost of being a company) into the few layers it controls. We have watched companies struggle to make this model work, in large part because it is hard and easy to misunderstand. There is a whole aside I could write about how single-company open source makes these even worse because it buries the cost for layers like writing and maintaining software into the layers that are company-controlled, but I won’t, to keep this short. But know this context. What this means, in the end, is that I believe procuring open source can sometimes lead, paradoxically, to an increase in cost versus procuring the layers separately … but only if you think broadly about procurement.</p>

<p>Too often we assume procurement == purchasing, but it doesn’t have to. <a href="https://www.merriam-webster.com/dictionary/procuring">Merriam-Webster</a> reminds us that procurement is “to bring about or achieve (something) by care and effort.” Therefore we could encourage entities like the EU to procure open source software by using a layered approach and have an outcome identical to the procurement of the same software in a non-open way at the same or lower cost. Open source doesn’t need to save money; it just needs to not “waste” it.</p>

<p>The key is the rise of software as a service. From an accounting perspective, software as a service moves software expenses from a model of large one-time costs with smaller, if any, recurring costs to one of just recurring costs. The “Hotel California”<sup id="fnref:saas-exit"><a href="#fn:saas-exit" class="footnote" rel="footnote" role="doc-noteref">1</a></sup> reality of software as a service - the idea that recurring costs can be ended at-will - is an exciting one organizationally as it gives flexibility at controllable cost, but in practice exit is often constrained by vendor lock-in, data egress limits, and portability gaps.</p>

<h2 id="the-layered-opex-model">The Layered OpEx Model</h2>

<p>Here’s how the EU can treat open source as a recurring cost:</p>

<ol>
  <li>
    <p><strong>Hire people to participate in the open source project.</strong> They are tasked with helping to maintain and advance software to keep it working and changing to meet EU needs. These people are, like most engineers at open source companies, paid to focus on the organization’s needs. They differ from our typical view of contributors as people showing up to “scratch their own itch.”</p>
  </li>
  <li>
    <p><strong>Enter into contracts with external parties to provide consulting and support beyond the internal team.</strong> These folks are there to give you diversity of thought and some guarantees. The internal team is, by definition, focused just on EU problems and has a sample size install base of one. External contractors will have a much larger scope of interest and install base sample size as they work with multiple customers. Critically, this creates a funding channel for non-employees and speaks to the “pay the maintainers” crowd.</p>
  </li>
  <li>
    <p><strong>Continue to fund internal IT departments to care and feed software and make it usable instead of shifting this expense to a single-software solution vendor.</strong> These folks are distinct from the people in #1 above. They are experts in EU needs and understand the intersection of those needs and a multitude of software.</p>
  </li>
</ol>

<p>Every one of these expenses is recurring and able to be ended at-will. But only if ending these expenses is something we are willing to knowingly accept. We already implicitly accept them when we buy from a company. The objections I expect are as follows. Before you read them though, I want to define at-will. While it denotatively means “<a href="https://www.merriam-webster.com/dictionary/at%20will">as one wishes : as or when it pleases or suits oneself</a>” in our context we can extend this with “in a reasonable time frame” or “with known decision points.”</p>

<h2 id="expected-objections">Expected Objections</h2>

<ol>
  <li>
    <p><strong>If you can terminate the people hired to participate in open source projects like this, they’re living contract to contract.</strong> To this I say, yes in the sense that they don’t have unlimited contracts, but no in the sense that they are still employees with employee benefits and protections, like notice periods. The big change is that they can be terminated solely due to changes in software needs.</p>
  </li>
  <li>
    <p><strong>But allowing for notice periods is expensive. EU employees are often perceived as more expensive than private sector ones or individual contractors.</strong> To this I say, maybe. But isn’t that the point? Shouldn’t we want to be in a place where we are <em>not</em> creating cost savings by reducing the quality of life for the humans involved?</p>
  </li>
  <li>
    <p><strong>If everything is either an employment agreement with a directed work product (do fixes/maintenance for our use case or install and manage this software) or a support/consultancy contract we aren’t paying maintainers to be maintainers.</strong> To this I say, you’re right. The mechanics of project maintenance should be borne by all of the project’s participants and not by some special select few paid to do that work. There is a lot of room here to argue about specifics, but rise above it. The key thing this causes is that no one is paid to just “grind out features or maintenance” on a project that isn’t used directly by a contributor. A key concept in open source has always been that people are there to either scratch their own itch or because they have a personal motivation to provide a solution to some group of users. This model pays for the first one and leaves the second to be the altruistic endeavor it is. Also, there are EU funds you can get to pay for altruistic endeavors :D.</p>
  </li>
  <li>
    <p><strong>This model doesn’t explain how software originates. What happens when there is no open source project (yet)?</strong> To this I say, you’re also right. This is a huge hole that needs more thought. Today we solve this with VC funding and profit-based funding. VC funding is predicated on ownership and being able to get return on investment. If this model is successful there is very little opportunity for what VCs need. However, profit based funding, when an entity takes some of its profit and invests in new ideas (not features) still can exist as the consulting agreements can, and likely should, include a profit component. Additionally, the EU and other entities can recognize a shared need through the consensus building and collaborative work on participation in open source software and fund the creation of teams to go start projects. This relies on everyone giving the EU permission to take risks like this.</p>
  </li>
  <li>
    <p><strong>The cost of administering these three expenses will eat up the cost more than paying an external vendor.</strong> To this I say, maybe, but it shouldn’t matter. While I firmly believe that this shouldn’t be true and that it should be possible for the EU to efficiently manage these costs for less than the sum of the profit-costs they would pay a company, I am willing to accept that the “expensive employees” of #2 above may change that. But just like above, I think that’s partly the point.</p>
  </li>
  <li>
    <p><strong>Adopting this model will destroy the software industry and create economic disaster.</strong> To this I say, take a breath. The EU changing procurement models doesn’t have the power to single-handedly destroy an industry. Even if every government adopted this, which they won’t, the macro impact would likely be a shift in spend rather than a net loss. This model is practical only for the largest organizations; most entities will still need third-party vendors to bundle and manage solutions. If anything, this strengthens the open source ecosystem by providing a clear monetization path for experts, while leaving ample room for proprietary software where it adds unique value. Finally, the private sector is diverse; many companies and investors will continue to prefer traditional models. The goal here is to increase EU participation in a public good and reduce dependency, not to dismantle the software industry.</p>
  </li>
</ol>

<h2 id="what-to-ask-the-commission">What To Ask The Commission</h2>

<ul>
  <li>When choosing software, the budget must include time for EU staff (new or existing reassigned) to contribute to the underlying open source projects.</li>
  <li>Keep strong in-house IT skills to ensure that deployed solutions meet needs and work together</li>
  <li>Complement your staff with support/consultancy agreements to provide the accountability partnership you get from traditional vendors and to provide access to greater knowledge when needed</li>
  <li>Make decisions based on your mission and goals and not your current inventory; be prepared to rearrange staffing when required to advance</li>
</ul>

<p>This was quickly written this morning to get it out of my head. There are probably holes in this and it may not even be all that original, but I think it works. As an American who has lived in the EU for 13+ years, I have come to trust government more and corporations less for a variety of reasons, but mostly because, broadly speaking, we tend to hold our government to a higher standard than we hold corporations.</p>

<p>I’m posting this in January 2026, just before FOSDEM. I’ll be there and open for conversation. Find me on Signal as <code class="language-plaintext highlighter-rouge">bexelbie.01</code>.</p>

<div class="footnotes" role="doc-endnotes">
  <ol>
    <li id="fn:saas-exit">
      <p>Many software as a service agreements allow you to stop paying but still make true exit difficult due to data gravity, integrations, and proprietary features. In practice, you can “check out,” but actually leaving is often costly and slow. <a href="#fnref:saas-exit" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
  </ol>
</div>]]></content><author><name>Brian &quot;bex&quot; Exelbierd</name></author><summary type="html"><![CDATA[Treat OSS as recurring OpEx: hire contributors, contract experts, and fund internal IT so the EU participates, not just buys.]]></summary></entry></feed>