systemd-sysext in Production

What We Learned Extending /usr Without a Package Manager

Brian "bex" Exelbierd  ·  [email protected]
Principal Product Manager, Microsoft Azure
Upstream Linux
Daniel Zaťovič  ·  [email protected]
Software Engineer, Microsoft Azure
Flatcar Engineering

  DevConf.CZ  ·  June 18, 2026  ·  Brno

Microsoft


I’ve got 99 problems and they’re all packaging

  • Jay-Z in his Linux Admin Days

… and now there are 15 standards

h:400


Agenda

  • Why we needed sysext
  • What sysext actually is
  • How Flatcar ships it in production
  • Live demo
  • What broke, what we fixed
  • What we ship
  • When to reach for sysext
  • Try it today

We built an immutable OS — on purpose

Flatcar (and CoreOS before it) made a design decision: the OS image is a signed and read-only.

  • /usr is immutable and dm-verity protected
  • no package manager on the host
  • every node in the fleet is bit-identical, reproducible and defined at provisioning
  • avoids configuration drift
  • updates are atomic A/B — whole-image, auto-rollback, no half-patched states
  • single-purpose OS - provide just enough to run containers

You can’t tweak the OS in place, it’s a value proposition not a bug.


… but we still needed to change some parts on the fly

The one part of the OS users most need to control is the container runtime.

  • the most common exmple is contaner runtime
    • “Pin Docker to a specific version and hold it there while the OS keeps updating”
    • “Run one Docker version on these nodes, a different one on those”
    • “We don’t want Docker at all — ship containerd only”
    • “We need podman / a newer runc / a different container runtime”
  • ship GPU specific drivers and container runtimes for GPU workloads

A frozen /usr can’t do any of that.


Our first answer: Torcx

Torcx — a boot-time addon manager (CoreOS heritage) that let you pick your Docker / containerd / runc.

  • At build time, the real binaries were replaced with symlinks to nowhere
  • At boot, the Torcx image was fetched, unpacked, and the symlinks were rewired — magic
  • Non-standard paths → couldn’t reuse upstream Gentoo ebuilds; invasive build-time hooks
  • Updating meant a manifest change → re-provision or in-place edits → config drift
  • Images were frozen after build — you couldn’t add or modify one

Bespoke, brittle, and worse Flatcar-only. We get to build it … alone … forever.


So we adopted a standard: sysext

Instead of maintaining more bespoke tooling, we moved to a systemd primitive the whole ecosystem shares.

“System extension images may — dynamically at runtime — extend the /usr/ and /opt/ directory hierarchies with additional files. This is particularly useful on immutable system images where a /usr/ and/or /opt/ hierarchy residing on a read-only file system shall be extended temporarily at runtime without making any persistent modifications.”

systemd-sysext(8) man page

If you have a modern systemd, you already have this. It is not a Flatcar only feature, Flatcar is just one of the distros that took the bet early.


The mechanics in 30 seconds

w:960

A directory, FS image or DDI with /usr/lib/extension-release.d/extension-release.<SYSEXT NAME>:

ID=flatcar           # target OS — must match host's ID (or _any)
ARCHITECTURE=x86-64  # must match uname (or _any)

# pick ONE version-matching key:
VERSION_ID=4628.1.0  # pin to one OS release (can link host /usr)
SYSEXT_LEVEL=1.0     # instead of using specific OS version use sysext level

systemd merges only when these match the host. systemd-sysext merge | unmerge | refresh.


What sysext is NOT

“System extension images should not be misunderstood as a generic software packaging framework, as no dependency scheme is available: system extensions should carry all files they need themselves, except for those already shipped in the underlying host system image.”

systemd-sysext(8) man page

  • No dependency resolution — author owns the dep closure
  • No scriptlets / %post / triggers — pure file overlay only
  • Hierarchy locked to /usr + /opt — files elsewhere are silently ignored
  • Additive by spec — collision detection is not enforced

If you need any of those, sysext is the wrong tool.


OS-dependent vs independent sysexts

### `ID=flatcar` + `VERSION_ID` *"I move when the OS moves."* - Pinned to a specific Flatcar release - Can dynamic-link against host `/usr` - Won't load after an OS update — rebuilt each release - **Use for**: OEM agents, drivers
### `ID=_any` *"I run on any OS."* - No version coupling — matches any host, any release - Must bundle everything (static, or libs in `/usr/local//`) - Updates on its own clock via `systemd-sysupdate` - **Use for**: tools with separate life cycles, distro-agnostic tools </div> </div> --- # Demo ## Kubernetes-as-a-sysext --- # Issue with self-contained sysexts - **Nothing you can count on** - **No dependency resolution** — no package manager, if you need a library you have to bring it - **You vendor each dependency by hand** — walk the link tree and copy every missing library into the sysext: ``` ldd /usr/bin/ libfoo.so.1 => not found ← you must bring this libbar.so.2 => not found ← and this libc.so.6 => /usr/lib/... ← already in base /usr, OK ``` - **It's recursive** — dependencies have dependencies, so re-run `ldd` on everything you copy - **Collisions are a secondary risk** — if two sysexts ship the same `libfoo.so.1`, first merge wins and overlayfs won't warn you. --- # Our answer to this Two tools in `flatcar/sysext-bakery`, both making a sysext self-contained: - **Flix** — rewrite ELF binaries with `patchelf` so each only looks in its own private dir - `patchelf --set-interpreter /usr/local//ld-linux --no-default-lib --set-rpath /usr/local/ /usr/bin/` - No host-library coupling — the binary can't pick up a colliding `libfoo.so` - **Flatwrap** — ship the whole rootfs under `/usr/local//`, wrap the entry point in a mount namespace - `unshare -m` → `mount --bind /usr/local//usr …` → `chroot` - Each invocation runs in its own private view of `/usr` Note: Today we have ship most of our sysexts either built directly against the distro during release or as independent **static binary** sysexts. This partly because the cloud-native world is Go and Rust, and partly because most of the non-static use cases have warranted including in our release pipeline. However, we do ship `tilde` (Flix) and `btop` (Flatwrap) using these tools, partly as proofs of concept. --- # Next problem we hit: confext/sysext was strictly read-only Read-only is great, right up until it isn't. Some use cases that needed a writable upper layer: - We wanted to use confext for managing `/etc` and most things expect to be able to write `/etc` - Using sysexts on a mutable system sets `/usr` to read-only So we drove a fix **upstream** landing a spec change for mutable mode which shipped in **systemd v256** (June 2024). --- # What we ship today In Stable- Three categories of Flatcar sysexts - **Opt-out** — shipped *in the base image*, on by default - `docker-flatcar` · `containerd-flatcar` · `oem-*` - **Opt-in** — built by Flatcar CI, **downloaded at provisioning**, off by default - `nvidia-drivers-*` · `zfs` · `python` · `podman` · `incus` - Enable with one line: `echo nvidia-drivers >> /etc/flatcar/enabled-sysext.conf` Opt-out and opt-in are **OS-dependent**: `ID=flatcar` + `VERSION_ID`, signed with the image's ephemeral build key. - **Community** — from the sysext-bakery, community-built and *not* Flatcar-tested - `kubernetes` · `k3s` · `cilium` · `nerdctl` · `tailscale` · …28 recipes - `ID=_any`, self-contained; updates **independently** via `systemd-sysupdate` - To enforce signatures, **import the bakery key yourself** — it isn't in the image --- # What we ship also today In Beta (now rolling into Stable): - **Confext** — `/etc` is now a `systemd-confext` in a mutable-mode (replaced our custom overlayfs scripts) - **Sysexts cryptographically signed** — dm-verity roothash signatures, ephemeral build key - **Format change** — squashfs → **erofs DDI** (Discoverable Disk Image) This is list is all just tooling around sysext for our users, not the implementation itself. --- # The two-axis test ![h:440](assets/two-axis-plane.svg) **The sysext quadrant** — needs the host, but is too opinionated to ship with it. --- # Comparing rpm-ostree, bootc, and sysext **rpm-ostree** - "I want to add a distribution component" - Full distro dependency graph; rollback by deployment swap; cadence tied to the base - rpm-ostree layers **at runtime/on-host, per host** then reboot **bootc** - "I want to rebuild my entire system on changes" - Full distro dependency graph; rollback by deployment swap; cadence tied to the base - bootc bakes **at build time**, ships an OCI image then reboot **systemd-sysext** - "I want a self-contained add-on with a separate lifecycle and no reboot" - **No dependency resolution** — a `.raw` tree overlaid on `/usr`; you bundle deps - `merge` / `unmerge`; cadence **independent** per extension no reboot --- # Demo ## ... but wait, there's more. --- # Try it today - Thank you
### Resources - **flatcar.org** — website & docs - **extensions.flatcar.org** — sysext bakery - **Discord** — `discord.gg/PMYjFUsJyq` - **Chat** — Matrix · CNCF Slack `#flatcar` - **Office Hours** — every 2nd Tue, 15:30 UTC
### Try it locally ```sh # clone flatcar/sysext-bakery, then: ./bakery.sh list ./bakery.sh boot kubernetes # SSH into the VM: systemctl status kubelet ```
**Lennart's vision**: [0pointer.net/blog/fitting-everything-together.html](https://0pointer.net/blog/fitting-everything-together.html) **UAPI extension-image spec**: [uapi-group.org/specifications/specs/extension_image](https://uapi-group.org/specifications/specs/extension_image) **FCOS sysexts**: [github.com/travier/fedora-sysexts](https://github.com/travier/fedora-sysexts)
Brian "bex" Exelbierd  ·  [email protected]
Principal Product Manager, Microsoft Azure
Upstream Linux
Daniel Zaťovič  ·  [email protected]
Software Engineer, Microsoft Azure
Flatcar Engineering
---