even the horse knew

Rituals

2025-10-02T14:38:00+02:00

Rituals are a funny thing. We form them without even realizing it. It’s the cafe you visit with a friend for breakfast each month - not for the food, but for the company, the setting, and the sense of comfort.

A new ritual has entered my life: dropping my daughter off at first grade. The walk to school is a novelty for me. I’ve never lived walking distance from any school I attended, so our quick three-minute stroll feels special. It’s the same path we took to her kindergarten tram last year, but now we just cross the street, and we’re there. First grade is the “start of school” year in Czech Republic, so this time the drop off has more meaning … and less parental involvement.

Inside the entry hall, our goodbye is its own small ceremony: a hug, a kiss, a few seconds of token tears about not wanting to go to school, then a sniff as she joins the stream of kids heading into the locker room. Czech kids change into slippers at school, so she has to swim upstream through the crowd to her locker. I know she leaves her coat there, but beyond that, it’s a mystery. Parents are barred from the locker room for “hygienic” reasons¹ - or so the sign says. I suspect it’s really about helping the kids find their independence.

My job is to wait. At the end of the hall are two glass-paned doors, locked from my side. I wait and wonder what’s taking so long, and then she appears from another door, heading for the grand staircase to her classroom. We wave and blow kisses as she climbs. I get a wave halfway up and another from the top. Just before she disappears, she ducks to peer back through a gap between columns for one last wave, then straightens up and enters her world.

Every day, a group of parents performs the same ritual. Many have younger children with them, observing how it’s all done. One woman always has a small dog over her shoulder, facing away from the action. A few kids get caught in an eddy, taking forever to emerge, and you can see the parents glancing at their watches. Others come to the glass to wave, mouthing words and pointing, only to be met with gestures to “go upstairs.” They never seem to realize the doors aren’t soundproof; they’re just old, and we could hear them perfectly if they didn’t mouth the words like a whisper.

My part of the ritual is done. I leave with a smile on my face and return later for the reverse. I tap an NFC chip on my keyring to a reader, which tells me if she’s in the after-school program or still at lunch or otherwise occupied and not allowed to leave yet. This tap also tells the school I am here so they send her out. She appears from somewhere - I honestly don’t know where - and comes to the glass doors. Only she can open them from her side, and she makes a show of struggling with the handle until, with great effort, she’s free. Then the stories from her day come spilling out, a mix of songs she sang and things she made, all while I’m trying to get her to put on her coat.

It’s a life worth living.

It’s hygienic in the same way that all restaurants that close unexpected for the evening do so for “technical reasons” that don’t explain that the cook is sick or whatever. You don’t need to know. ↩

Day 5: Microsoft Hackathon — Model Grading, Parameter Friction, and Next Questions

2025-09-19T19:15:00+02:00

So this is it - the last day of Hackathon week.

Today (after a late night) was about tuning and observing how models graded candidate threads. No polished newsletter yet, but the code is published: mailing-list-surfacing¹. It’s a one‑person PoC with plenty of LLM‑authored code that still needs deeper review.

Disclaimer: I work at Microsoft on upstream Linux in Azure. These are my personal notes and opinions.

Model Evaluation

I only wired up two models because they were immediately available via an existing setup in Azure OpenAI: gpt-4o and gpt-5-chat. The latter performed better (unsurprising). Light prompt and heuristic tuning helped a bit, but I clearly need a labeled corpus of threads I have personally reviewed before pushing further.

Right now I’m only passing “tokens of note” + heuristic metadata (participants, entities, etc.) rather than full thread text. I still need to decide how much raw content vs distilled features to include for the importance phase.

Parameter Friction

Switching from gpt-4o to gpt-5-chat wasn’t seamless: parameter names and availability shifted. There is (as far as I can find) no machine‑readable catalog of model parameter schemas. Abstraction layers (LiteLLM et al.) exist, but they flatten differences I may care about and add operational overhead I didn’t want for a hack week.

I’m debating a future micro‑spec project: machine‑readable model capability / parameter descriptors. I’m not committing yet as doing it right is a big job. There are some specific needs that don’t occur in my daily routine that need to be accounted for by a contributor with passion.

Reflection: Insights vs Metrics

Short version of a longer, still‑draft argument: many of us in open source / OSPO circles say we want “community insights,” but we mostly instrument what’s easy (repo / activity / contributor counts) because APIs are clean and numbers feel objective. That produces dashboards but it may not impact decisions. This experiment is a probe: can summarization + heuristics surface thread‑level signals that actually change what I do next month? Until I can log “signal → action → outcome,” it risks becoming dashboard theater. A longer reflection is coming, but I need to zen on it a bit longer.

Next Steps

If I keep working on this next, the list is:

Build a reviewed corpus of labeled threads (importance / interesting / ignore)
Iterate heuristic weighting + prompt style once labels exist
Improve cross-month linking and explore values you can derive from history (i.e. participant tenure)
Experiment with feature payload size to balance detail and efficiency

Closing

I reached the directional goal: the pipeline runs end‑to‑end, summaries exist for grading, and I surfaced real friction (threading earlier in the week; model swap today). Good enough for hack week. Now the real question is whether any surfaced “insight” will cause a concrete action. That’s the bar.

All that said, I know that my boss already has expansion ideas; we’ll see if time materializes.

I would not run this in production without a thorough review. ↩

Day 4: Microsoft Hackathon — Threading Failure, Recursion Repair, and Model Limits

2025-09-19T00:50:00+02:00

Today’s work (really posted just after midnight) started smoothly: I pushed a candidate set of mailing‑list threads all the way through summarization to validate the next phase of the pipeline. Then it cratered - duplicate subject lines that exposed my threading logic was flawed.

Disclaimer: I work at Microsoft on upstream Linux in Azure. These are my personal notes and opinions.

Test run before failure

I wasn’t choosing “important” threads yet - just seeing what wholesale summarization output looked like so I could later score, filter, and review. That smoke test did its job by flushing out a structural bug instead of letting it lurk until the importance pass.

The threading bug

I’d tried to get a little “clever” with heuristics while lacking full headers (older emails, inconsistent client behaviors). The result: unrelated conversations collapsed because headers like In-Reply-To and References were not always present or complete. I also rely on some supplemental external data for stitching things together when the headers fail. Deterministic structure first, cleverness later - apparently I needed to relearn yesterday’s lesson.

Recursion beats cleverness

I rewrote the logic as a plain recursive rebuild of parent/child relationships using what reliable metadata I do have. This was basically a CS 101 traversal plus extra credit for malformed inputs and odd nesting. The LLM handled the boilerplate once I spelled out the core structural invariant: each message links to at most one parent (by Message-ID via In-Reply-To / References), roots have none, and no cycles. I owned that mental model and the test cases. The model filled in syntax. That division worked. Watching multiple models miss the full shape of the bug was useful: they offered partial patches, none produced the end‑to‑end correction without guided constraints.

On expectations (and “LLM denying friends”)

To my LLM denying friends (you’re not skeptics and you know it!): nobody promised you could type “make Jira but with cat memes” and go to lunch. Marketing slides might imply that. Reality is still stochastic autocomplete with oversight and bite-sized work needed. It’s good at cranking out a recursive walk or refactoring a loop. It is terrible at handing you judgment, taste, or guardrails. I got cute, collapsed unrelated threads, paid the tax, rewrote it. That’s the actual human‑in‑the‑loop job: think, state the few non‑negotiable structural rules, let the model grind, verify, repeat.

What you do get: fast pattern completion over training data. “Traverse this tree and stitch orphan replies” lives there. “Subtly infer intent from half-missing mail headers and not overmerge” does not unless you spell it out. So yes, some of the debugging prompts contained some profanity. That didn’t summon intelligence, it just vented mine.

What I learned (again)

Deterministic scaffolding first. Probabilistic layers second.
“Clever” heuristics without gold tests are debt.
LLMs are fast at churning variants. They still need a crisp structural rule set to follow or test against.
A blunt recursive pass is often the right baseline even if it feels unsophisticated.

Back to the pipeline

With threading fixed I’m back to grading summarization output and preparing the next human‑in‑the‑loop pass for importance ranking. Tomorrow’s goal: start scoring threads with the heuristics + enrichment stack from earlier days and see if explanations remain concise enough not to drown me.

Posting this late counts as progress anyway. Better a truthful “I broke it and fixed it” than a shiny but misleading narrative.

More tomorrow.

Day 3: Microsoft Hackathon — Thread Heuristics, Importance Signals, and Agent Editing

2025-09-17T20:00:00+02:00

Today started with a plan: drop my kid at school, head to a coworking space, meet my teammate, and push the importance model forward. Reality: unexpected stuff pulled me back home first. Momentum recovered later, but the change in expectations reinforced how fragile context can be when doing iterative LLM + data work.

Disclaimer: I work at Microsoft on upstream Linux in Azure. These are my personal notes and opinions.

Thread heuristic exporter

I built a first pass “thread stats” exporter: sender count, participant diversity, tokens of note, and other structural hints. Deterministic, fast, and inspectable. This gives a baseline before letting an LLM opine. The goal: reduce the search space without prematurely deciding what’s “important.”

Planning importance signals

With that baseline, I worked (with ChatGPT-5) on how to move beyond raw counts. What makes a thread worth surfacing? Some dimensions that matter to me:

Governance, Policy or consensus decisions
Direct relevance to Azure or adjacent cloud platform concerns
Presence of Debian package names (as a proxy for concrete change surface)
Diversity of participants vs. a back-and-forth between two people
Emergence of unusual or “quirky” sidebars that might signal cultural or directional shifts (these are often interesting even if not strictly impactful)

I want to avoid letting pure volume masquerade as importance. A long bikeshed is still a bikeshed.

Data enrichment

I experimented with a regexp to match Debian package names. That worked a bit too well. Claude Sonnet was reporting threads moving forward and I couldn’t believe the numbers. A little investigation and it turned out the regexp was capturing everything. Sonnet suggested pulling in a Debian package list to tag occurrences inside threads. That, plus spaCy-based entity/token passes, lets me convert unstructured text into a feature layer the LLM can later consume. The MVP now narrows roughly 424 August 2025 threads to ~250 candidates for deeper scoring. Not “good,” just narrower. False negatives remain a risk; I’d rather over‑include at this stage.

Why deterministic first

Locking down deterministic extraction serves to reduce some of the noise before LLM scoring. It also provides a dial I can change as part of the review from the human-in-the-loop process I envision.

Next phase: human-in-the-loop LLM

Tomorrow I plan to let the model start proposing which threads look important or just interesting, then review those outputs manually - back and forth until the signals feel reliable. Goal: lightweight human-in-the-loop review, not handing over judgment. Keeping explanations terse will matter or I and any hypothetical readers will drown in synthetic prose.

Agent editing workflow

While agents “compiled¹,” I created an AGENTS.md doc to formalize how I want writing edits to work. This is about editing my prose, not letting a model co‑author new ideas. Core rules I laid down include:

Challenge structure and assumptions when they look shaky - do not invent content
Preserve hedges; mark uncertainty with [UNCLEAR] instead of guessing
Keep my voice; I review diff output before accepting anything

Most importantly, I am not an emoji wielding performative Thought Leader(tm) turned Influencer(tm). The new guidance has already reduced noise. I still refuse to let a model start from a blank page; critique mode is the win. Visual Studio Code diff views make trust-building easier - everything is inspectable.

Closing

Today was half heuristics, half meta-process. The importance problem still feels squishy, but the scaffolding is there. Now I’m going to stop and pick the kid up and let her redeem a grocery-store stamp card for a Smurf doll. Grocery stores are weird and since she speaks Czech she can do the talking.

Obligatory xkcd. In this case, letting an LLM grind out code for review. ↩

The Couple Across the Way

2025-09-17T13:50:00+02:00

From my window and balcony, I can see the balcony of a couple who live across the way. They seem to be about 10 to 15 years older than me, and I gather from their habits that they’re both retired. Several times a day, they step out onto their balcony with cups of coffee. One of them is often on the phone, while the other might sit quietly. Their voices are distinctive, so if my window is open, I always hear them.

They say you shouldn’t make up stories about people you observe and expect them to be true. It’s like assuming someone’s Instagram reflects their real life. But because I work from home, I see this couple frequently, and I think it’s lovely that they have this time, this space, and this ritual.

I often wonder what they’re saying—either to each other or to the person on the phone. My Czech isn’t strong enough to understand much, so I can’t piece together their conversations. I’ve never asked a Czech-speaking friend to listen in, either. If I did, I suspect I’d hear complaints, as I’ve been told that Czechs are prone to complaining. But one of the perks of not speaking the language fluently is that I can imagine otherwise. When I walk down the street, I assume people are talking about how beautiful the day is and how lucky we are to live in this city, rather than airing grievances.

What prompted me to write this today is that, for the first time, I noticed the woman wearing what I can only describe as a silly hat. She’s never worn a hat before, at least not that I’ve noticed. The man occasionally wears a baseball cap, but this was different. It made me wonder: is today silly hat day? If so, I hope it’s a happy one.

Day 2: Microsoft Hackathon — Distractions, Brainstorming, and Infrastructure

2025-09-16T21:50:00+02:00

Today was a mixed bag. I started with the goal of advancing the central metadata architecture, which is critical for figuring out what’s worth surfacing in the notebook. However, distractions and infrastructure challenges dominated the day. The day included a hot project at work, a meeting that couldn’t be skipped, and the ever-present TPS reports (or in my case, an expense report from a recent trip). These interruptions made it hard to focus on the metadata work.

Disclaimer: I work at Microsoft on upstream Linux in Azure. These are my personal notes and opinions.

Brainstorming with ChatGPT

Despite the distractions, I spent some time brainstorming with ChatGPT on methods for surfacing important information. We explored various heuristics and LLM concepts. It was a productive session that gave me ideas to refine and potentially turn into something valuable.

One area we explored was how to determine if a thread was “important.” This included qualitative factors like the number of unique participants in a thread and, in a future with more data, the history of those participants’ involvement in the list. We also discussed keyword surfacing as a way to highlight significant topics and the potential for trend analysis to predict emerging themes over time. While we touched on some additional qualitative measures, I don’t recall all the specifics.

Infrastructure Challenges

The day ultimately became about infrastructure. Scaling issues forced me to shift to a database for the MVP, and SQLite came to the rescue. While not ideal, it’s a practical solution for now.

The motivation for SQLite was scalability. To ensure thread completion, I had to load more than one month of data into the MVP. Bonus months from testing added even more data, so it made sense to work with fresh data rather than repeatedly processing the same old files. SQLite also provided a way to query the data efficiently without having to read through tons of JSON files. While this approach works for the MVP, it’s clear that a more robust solution—like an MCP server—might be needed in the future.

Final Takeaway

One thing this day reinforced is how fragile AI-driven development can feel without proper context management. Whether brainstorming with ChatGPT or coding in agent mode using Sonnet in Visual Studio Code, I’ve noticed that when tools lack memory or context, things can quickly go off the rails. For example, fragile solutions like resorting to regex to parse HTML instead of using a proper library like BeautifulSoup can emerge. This highlights the need for better agent hints or configuration files, though the idea of managing those feels cumbersome.

Tomorrow, I hope to meet with a teammate to advance the more interesting parts of the project. With the infrastructure in place, I’m optimistic about making real progress.

This remains a work in progress, but I’m hopeful that the brainstorming and infrastructure work will pay off in the coming days.

Day 1: Microsoft Hackathon — Building a Focused Summarizer for Upstream Linux

2025-09-15T20:50:00+02:00

This week is the Microsoft Hackathon, and I’m using it as a chance to prototype something I’ve been thinking about for a while: a tool that summarizes what’s happening in upstream Linux communities in a way that’s actually useful to people who don’t have time to follow them day-to-day.

Disclaimer: I work at Microsoft on upstream Linux in Azure. These are my personal opinions.

For my MVP, I’m going to try to produce a “What happened in Debian last month” summary from selected mailing lists. It’s not a full picture of the community, but it’s a solid basis for a proof of concept.

Why this project?

Part of my work at Microsoft involves helping others understand what’s going on in upstream Linux communities. That’s not always easy — the signal is buried in a lot of noise, and most people don’t have time to follow mailing lists or community threads. If this works, I’ll have a tool that can generate a newsletter-style summary that’s actually useful.

Why Debian?

For this MVP, I chose Debian. It’s a community I work with but haven’t traditionally followed as closely as Fedora, where I have deeper experience. That makes Debian a good test case — I know enough to judge the output, and I have colleagues who can help validate it. I’m focusing on August 2025 because I already know what happened that month, which gives me a baseline to evaluate the results.

Agentic coding, not vibe coding

Agentic coding, in my view, is when you rely on an LLM to do the heavy lifting — generating code, suggesting structure — but you stay in the loop. You review what’s written, check the inputs and outputs, and make sure nothing weird slips in. It’s not fire-and-forget, and it’s not vibe coding where you just hope for the best. I don’t read every line as it’s generated, but I do check the architecture and logic. One of my frequent prompt inclusions is “don’t assume, ask and challenge my assumptions where appropriate.” This helps uncover ideas as I develop, similar to an agile process.

A breakfast pivot

This morning over breakfast with a friend, I walked through the architecture I’d outlined with Copilot on Friday. Originally, I was planning to build a vector database and use retrieval-augmented generation (RAG) to power the summarization. But as we talked, it became clear that this was overkill for the MVP. What I really needed was a simpler memory model — something that could support basic knowledge scaffolding without the complexity of full semantic search.

So I pivoted. Today’s work focused on getting the initial data in place: downloading a couple of months of Debian mailing-list emails to ensure I had full threads from August, storing them locally to avoid putting any load on Debian’s infrastructure, and building scaffolding to sort and store the data so it supports both metadata generation and LLM access.

Could I have used a vector database or IMAP-backed mail store? Sure. But this was quick, easy, and gave me a chance to practice agentic coding in Python — something I don’t get to do much in my day-to-day product management work.

What I’m hoping to learn

This MVP is about testing whether AI-generated insights from community data are actually useful. In OSPO and community spaces, we talk a lot about gathering insights — but we don’t always ask whether those insights answer real questions. This is a chance to test that. Can we generate something that’s not just interesting, but actionable? It feels a bit like the tail wagging the dog, but sadly that’s where we seem to be.

Any surprises?

Nothing major yet, but I appreciated that the LLM caught a pagination issue I missed. I’d assumed a dataset was complete; while reconstructing threads it exposed an oddly truncated dataset. Today’s work also reminded me to be deliberate about model selection — not all LLMs are created equal, and the choice matters if you don’t arbitrarily default to the latest frontier models.

What’s on deck for tomorrow?

Thanks to how some data structures came together, I’m rearchitecting the metadata store. This lets me defer generating the basic, memory-style knowledge passed to the LLM until I’m closer to using it, which should prevent some ugly backtracking.

I keep relearning this: don’t build perfect infrastructure for an MVP - ship the smallest thing that answers the question.

A Quarter of a Billion Laptops … #Endof10

2025-08-19T14:10:00+02:00

Disclaimer: I work at Microsoft on upstream Linux in Azure. These are my personal opinions; I don’t work on Windows.

Why #Endof10 matters

I really appreciate the End Of 10 campaigns. I think there are two great effects coming out of this effort. First they’re raising awareness of Linux and creating an opportunity for conversations that might not normally be held. As a part of this, the campaign is forcing people who promote Linux to sharpen their talking points. Projects are being forced to do the real (and hard!) work of figuring out what problem they’re solving for users and how to convey the value they bring to users. Too many people dismiss marketing without realizing this, at its core, is what marketing is. If you want users, you need good marketing, and I don’t mean cute pictures.

A part of this conversation needs to be the oft ignored or forgotten reality that most computer users just need a web browser. Windows and Linux both have the major browsers and a host of minor ones. Pick one, a good one, make it full screen and hand the laptop over and many users will walk away happy often without caring what the underlying operating system even is. Like it or not, webmail, web apps, streaming media and more have made the browser the most used app on many laptops. Sure you can do more, but most people won’t in their personal life. In business environments the same take over is happening. It’s usually 1-2 apps paired with a web browser that is doing a lot of the heavy lifting. As a corporate IT department I don’t want anything else on the devices.

The second thing big is what this moment reveals about Linux versus Windows. Despite Linux being born of Freedom and arguably the poster child for open source, it, for most users, represents restrictions on choice. This is actually a good thing. Windows, as a platform is widely (mis)understood by users. It is easy to install software that degrades performance, or too struggle to find the best or optimal choice of software in a crowded market, or worse, to become a target of malware and other attacks that focus on the largest consumer installed non-mobile operating system.

Linux, on the other hand, is unfamiliar to most people. That doesn’t make it hard or bad, but it makes it less understood and more work is required to understand it. Open source projects have a hard time maintaining consistency and long-term quality in their usability. It’s too easy for the software change without the UX folks being able to keep up. There are just too few UX folks in open source. This, at its root means that users will have a harder time installing bad software and will often need to seek advice, hopefully from reputable places, about what they should install.

These things matter and we risk undermining ourselves when we let hyperbole creep in.

Hyperbole, it’s what’s for snark-fest

All of that brings me to the thing that bothers me: the ton of hyperbole, like that in the quote above. I deliberately am not crediting that toot because my goal isn’t to name and shame. I want to stay focused on the sentiment. It doesn’t help the cause when we appeal through hyperbole. To be clear, none of this is to say at the end of Windows 10 won’t generate excess replacement cycles or E-waste. These are important concerns, but it is ridiculous to suggest that everything‘s going to the landfill.

Let’s run the numbers

Let’s put some numbers on this. A quarter of a billion is 250 million. According to IDCin quarter 4 of 2024, the latest numbers I could find, ~109 million personal computing devices (all operating systems, aiui) were shipped. However, most personal users are going to replace a laptop with a laptop or move to a tablet, imho. So looking at just laptops and tablets, only ~68 million units were shipped. At these rates you’ll need a solid year of 100% of the new shipments going to replace “landfilled” laptops to enable what our friend above thinks is going to happen. That’s not real. Even in normal periods only a percentage of shipments represent replacements with some being net-new. And honestly, if Windows 10 ending could cause 250 million laptops to ship in one quarter, Windows 10 would have ended a lot sooner. There will definitely be an increase in purchases, but it won’t be 250 million.

But even more important than new shipments is what happens to old machines. This is the used market. You don’t have to look too hard to find excitement in the, mostly Linux, community of folks looking for cheap hardware for projects. They’re waiting for a windfall of cheap laptops to do home automation and more with. They’re licking their lips the way they do when each successive generation of small form factor mini PCs is retired out of offices and retail locations around the world.

And all of this works because we have a rich used computing market. Companies don’t typically trash or shred machines. Many resell them by sending them to specialized organizations that clean them, decommission them, and certify them as being company data free. Those companies then push them out into the used market. The margins are thin, but corporate computing refresh cycles are predictably every 3 to 4 years so there is a sustainable business in this.

What happens next

Which leaves us with the real question: what do Windows 10 users actually do next? Some will buy a new machine and move to Windows 11 because it’ll be familiar, with a few things to like and a few things to grumble about. Some will take the leap to Linux, where they’ll find both a welcoming community and, sometimes, a prickly one. And some, far more than we might expect, will just keep running Windows 10 and expose themselves to unnecessary risks. That last option is clearly the worst. So the opportunity in front of us isn’t about exaggerating numbers or scaring people, it’s about helping them land in a sustainable place. If projects double down on the already sizeable efforts going on to meet users halfway and help advocates with their messaging, Linux can win many converts. Still others may move on to MacOS or even a tablet, and that is OK too. What matters is that we help people land well and don’t leave them behind or at risk because of unsupported software.

My Notes on The Goal, by Eliyahu M Goldratt

2025-08-15T10:55:00+02:00

I recently listened to the audiobook version of The Goal by Eliyahu M Goldratt. The book is a fictional account of a production plant, but it’s really a Socratic development of the Theory of Constraints. While the setting is manufacturing, the principles are a valuable mental model for any kind of work. The concepts resonated so much that I wanted to capture the key ideas for future reference.

I walked away from the book wanting to remember two different sets of principles. I am reproducing them here. I am treating these as quotes even though I suspect I have inaccuracies in my transcriptions:

Theory of Constraints principles - The Five Focusing Steps

Identify the Constraint: Find the bottleneck in your team’s workflow. This isn’t a single person, but rather the single step or resource that’s holding up the most work. It could be a specific skill set, a review process, or a tool.
Exploit the Constraint: Ensure the bottleneck is always working on the most important tasks. Don’t let it sit idle. If the constraint is a person, make sure they aren’t distracted by low-priority tasks.
Subordinate Everything to the Constraint: All other team members and processes should be focused on supporting the constraint. The work feeding into the constraint should be prioritized, and the work flowing out should be handled efficiently. The pace of the entire team should be set by the pace of the constraint.
Elevate the Constraint: If the constraint can’t keep up with demand, invest in improving it. This could mean training new people, purchasing new tools, or improving the process itself.
Go Back to Step 1: Once you’ve successfully managed the old constraint, a new one will emerge. Continuous improvement means always looking for the next bottleneck.

These principles apply to both project-based and ongoing work. While there are specific theories for low-touch work, or work where the actual effort is a small part of the total production time, typically manufacturing, and high-touch work, the opposite, typically knowledge work, these principles underlie them both. The Goal focuses on the low-touch, but these core concepts are universally applicable

The book Critical Chain by the same author apparently dives into high-touch work and is on my reading list. When you have work that is driven by a knowledge working team that doesn’t have a production target but is instead broader goal foused with lots of quasi-related tasks, you should apparently ground in the high-touch concepts and then return to these principles.

Beyond the five focusing steps, Goldratt also offers a framework for implementing change once you’ve identified a constraint.

The Three Questions for Guiding Change

What to change? (Identify the problem) This step involves identifying the core problem or constraint that is limiting the system’s performance. It’s the bottleneck that prevents the organization from achieving its goal.
What to change to? (Identify the solution) This step involves developing a solution that will effectively address the core problem. The solution should be a new policy or procedure that will remove the constraint and improve the system’s throughput.
How to cause the change? (Identify the change needed) This final step involves creating a plan to implement the solution and overcome any resistance to change. It’s about getting buy-in from the team and making sure the change is successfully executed.

This framework made me think about how managers should involve their teams in the change process. It provides a simple, collaborative way to solve problems, rather than making it a top-down function. Whether this was intentional or not it breaks this act up into stages that create natural points to grow the number of people involved in the activity.

A Bonus Observation on Education vs Training

As an aside, in the opening of the book Goldratt makes the following observation and it resonated heavily with me:

Education is learning through a deductive process. Training is showing the end conclusions with no derivation. Education is the presentation of question marks not exclamation points.

This seemed key in helping me both explain why we teach something like Algebra but also why so much corporate training fails.

A point Goldratt makes is that academics tend to start their thinking with a hypothesis, while common practice is to gather data and then try to force-fit an external structure. I am keeping this idea, along with the other principles, in mind as I approach problems and projects. My hope is that they guide me to be more focused on the problem and the solution than the whole problem space.

Email was a bad idea, or: How I Learned to Love DNS records

2025-07-25T10:25:00+02:00

Recently I tried to move a mailing list subscription to a new email address on a different provider. I never got a subscription verification email. It turns out my new mail provider is more strict than my old one. As a result, I’ve had the “opportunity” learn way more about SPF, DKIM, DMARC, ARC, and even some lions, tigers, and bears (oh my!) than I ever wanted to. This is a roundup of what I’ve learned.

Who is this email even from?

Before diving into all the acronyms, you need to understand that email has two different ‘froms’.

One ‘from,’ the FROM Header, is set by the sender in their email client and not validated by SPF. This is the name and email address displayed to the user reading the email. This is what you see in your email client.

The second ‘from’ is the ENVELOPE FROM or RETURN-PATH Header. This is the information the receiving mail server gets in the MAIL FROM command sent by the sending mail server. The RETURN-PATH header is written by the receiving mail server and is visible in your mail client when you show all headers. Often, the receiving mail server preserves any old RETURN-PATHs as well, so you may see those too.

SPF (Sender Policy Framework)

Now that we know who an email claims to be from, SPF helps us check whether it’s allowed to send on behalf of that domain. SPF records let receiving systems verify that the sender is authorized to send mail on behalf of a domain. This domain may or may not match the domain you see in your email client. The SPF record is a list of IP addresses, hostnames, and similar data that specifies where mail for a domain should originate. Via a policy statement in the record, the sender can provide advice on what recipients should do if they get email from anywhere not listed. Specifically suggest a policy: reject it, quarantine it, or simply do whatever you want with it. These records are set by the sender in their domain’s DNS. For my own email domain, bexelbie.com (don’t ask why it isn’t winglemeyer.org), I have SPF records for iCloud, Cloudflare, and Google, as any of those may originate email.

What should you do?

If you’re an email user: Use your mail provider’s sending host (SMTP) to ensure a valid SPF check when the receiver gets your email.
If you’re a provider: If at all possible, put together an exhaustive list of hosts your server uses and keep it up to date.
If you’re a mailing list provider: Set up SPF for your list domain so administrative messages pass. If you’re providing the Envelope From (and you should be), you’ll also need the SPF record for that. If your top-level domain can’t do SPF for some reason, at least use SPF on your list subdomain.

DKIM (DomainKeys Identified Mail)

While SPF verifies who sent the email, DKIM helps us know if this email has been altered by a third-party. DKIM is a signature, think cryptography, that signs headers selected by the sender and the email body. This lets recipients verify the email wasn’t altered in transit. For DMARC alignment purposes (see below), the FROM header must be signed.

DKIM’s signature is verifiable via a public key in the domain’s DNS entry, which allows a receiver to verify both the email’s validity and if it was actually sent by the domain in question. This is typically added by the outbound mail server before it gets to the receiver.

What should you do?

If you’re an email user: Other than holding your provider accountable, there is nothing you can do.
If you’re a provider: Use DKIM for your users.
If you’re a mailing list provider: Set up DKIM on your own messages and sign outgoing list email. It’s good practice, especially if you’re modifying the message by adding headers, footers, or even mangling existing headers.

DMARC (Domain-based Message Authentication, Reporting, and Conformance)

DKIM and SPF can be combined to decide what to do with an email. DMARC is a policy statement about what the sender would like done with message failures. DMARC is the policy the sending domain is recommending to receivers about what to do with email that lacks SPF and/or DKIM domain alignment, or when neither is present. The sender can specify three options: p=none (meaning ignore these issues), p=reject (meaning throw away email that isn’t compliant), or p=quarantine (meaning quarantine this as possible spam).

DMARC has two sets of conditions it considers to determine if a message passes. This is where a lot of confusion arises, as it is complicated. DMARC passes only if at least one of the following two conditions is true:

The DKIM signature is valid AND the DKIM signing domain (from the d= tag in the DKIM-Signature) aligns (matches) with the FROM Header domain.
The SPF check passes for the ENVELOPE FROM domain AND the ENVELOPE FROM domain aligns (matches) with the FROM Header domain.

Note: DMARC HAS relaxed and strict modes. Strict mode requires domains to align (match) exactly (e.g. bexelbie.com = bexelbie.com), while relaxed mode (the default) allows alignment (matching) to be only at the top-level (e.g. mail.winglemeyer.org = winglemeyer.org).

What should you do?

If you’re an email user: Keep holding your provider accountable, because this is all on them.
If you’re a provider: Set up a DMARC policy, ideally at least p=quarantine if you’re high volume. Monitor your DMARC returns.
If you’re a mailing list provider: Set up a DMARC policy, p=none is acceptable. Monitor your DMARC returns to see if quarantine makes more sense.

ARC (Authenticated Received Chain)

DKIM focused on attesting to the email content, ARC attests to the metadata. ARC is a set of headers that can be added by an intermediary (such as a mailing list, email forwarding service, or archiving system) to preserve SPF and DKIM alignment for DMARC. While ARC is modeled after DKIM and includes cryptographic signatures, it signs metadata (like SPF/DKIM results) rather than the original message content.

ARC comes into play when email is forwarded. It solves the problem of legitimate intermediaries breaking SPF or DKIM when they modify the message. These modifications are often helpful for readers and can include things like adding headers or footers, or modifying subject lines. This allows the next receiver to trust the DKIM and SPF status reported by the intermediary when they perform their own DMARC analysis. It also allows the receiver to consider ARC signed results when evaluating DMARC, even if local checks fail.

What should you do?

If you’re an email user: You’re probably your provider’s favorite customer now that you are laser focused on their accountability.
If you’re a provider: Add ARC records if you’re forwarding mail.
If you’re a mailing list provider: Add ARC records as you are a mail forwarder. It’s doubly important if you’re modifying the email in any way that would break DKIM.

All of this complexity leads to one conclusion … email was a bad idea. Additionally, it’s now almost all sent on behalf of companies or spam. The pockets of actual human to human email are shrinking. In a world like this email servers need all the help they can get to stop unwanted or fake mail. Use providers that set these headers. If you own a domain, set them too. If you’re using an external mail provider for your domain they should have guidance how to set these records up and should manage all of these headers and keys for you. It isn’t hard and it is necessary if you want your emails to be delivered.

Thanks to Patrick Uiterwijk for reviewing an earlier draft of this post. Any mistakes that remain are probably mine, but feel free to blame DNS.

even the horse knew

Rituals

Day 5: Microsoft Hackathon — Model Grading, Parameter Friction, and Next Questions

Model Evaluation

Parameter Friction

Reflection: Insights vs Metrics

Next Steps

Closing

Day 4: Microsoft Hackathon — Threading Failure, Recursion Repair, and Model Limits

Test run before failure

The threading bug

Recursion beats cleverness

On expectations (and “LLM denying friends”)

What I learned (again)

Back to the pipeline

Day 3: Microsoft Hackathon — Thread Heuristics, Importance Signals, and Agent Editing

Thread heuristic exporter

Planning importance signals

Data enrichment

Why deterministic first

Next phase: human-in-the-loop LLM

Agent editing workflow

Closing

The Couple Across the Way

Day 2: Microsoft Hackathon — Distractions, Brainstorming, and Infrastructure

Brainstorming with ChatGPT

Infrastructure Challenges

Final Takeaway

Day 1: Microsoft Hackathon — Building a Focused Summarizer for Upstream Linux

Why this project?

Why Debian?

Agentic coding, not vibe coding

A breakfast pivot

What I’m hoping to learn

Any surprises?

What’s on deck for tomorrow?

A Quarter of a Billion Laptops … #Endof10

Why #Endof10 matters

Hyperbole, it’s what’s for snark-fest

Let’s run the numbers

What happens next

My Notes on *The Goal*, by Eliyahu M Goldratt

Theory of Constraints principles - The Five Focusing Steps

The Three Questions for Guiding Change

A Bonus Observation on Education vs Training

Email was a bad idea, or: How I Learned to Love DNS records

Who is this email even from?

SPF (Sender Policy Framework)

What should you do?

DKIM (DomainKeys Identified Mail)

What should you do?

DMARC (Domain-based Message Authentication, Reporting, and Conformance)

What should you do?

ARC (Authenticated Received Chain)

What should you do?

My Notes on The Goal, by Eliyahu M Goldratt