systemd-sysext in Production

- intro

bex - I want to start with a frustration. - Every distro, every immutable OS, every cloud-native platform — they all eventually trip over the same problem: "how do I get this binary onto that host in a way I can update and roll back without breaking everything else." - This is the oldest problem in our industry and we keep inventing new answers to it. - That's not a complaint. It's a setup.

bex - Having lots of standards isn't actually bad. Standards make some problems easier, often at the expense of making others harder. You have to decide which problems matter to you. - sysext is one of those standards. We are not going to pretend it makes the problem go away. - What we ARE going to tell you is when it's the right one — and what we've learned shipping it in production for two and a half years.

bex This is what we're going to talk about today

Daniel - Start with the design. - Treat the OS like firmware/adapter to run cintainers (flatcar carrying containers). - Read-only /usr, verity, atomic updates, no rpm/dpkg on the host. - This is great for security and for running a fleet — every machine is the same machine.

Daniel - But it causes a problem: the thing people most want to change is sitting inside the part we froze. - various requirements regarding container runtime - All of that is userspace that lives in /usr — exactly where we don't allow changes. - So we needed a way to make a few specific, OS-adjacent components swappable without giving up immutability everywhere else.

Daniel - Torcx did the job for years, and it's our origin story for sysext. - At build we baked symlinks-to-nowhere; at first boot Torcx downloaded an image, extracted it, and pointed the symlinks at the right place. That's the "magic" we wanted to kill. - It was custom tooling end to end — special packaging, non-standard paths so we couldn't reuse upstream ebuilds, a manifest you had to edit, images you couldn't change after build. - Hard to maintain, hard to version, hard to update, and nobody else in the industry shared the burden. That last part matters for the next slide. - Removed almost 3 years ago (in Alpha 3794.0.0, November 2023).

Daniel - as time progress, more distros with the immutable pattern started to show up and hitting the same problem - UAPI standard and systemd-sysext were created - rather than build Torcx 2.0, we adopted systemd-sysext — a cross-distro standard, not a Flatcar invention. - The win over Torcx: it's a standard. The spec exists, the tooling exists, other distros support it — we stopped carrying the whole maintenance burden alone. Very few people have pushed it past "hello world." We have. - similar concept exists for `/etc` called concept, we'll get to it later

Daniel - A sysext is a directory, FS image or DDI — typically an erofs or squashfs filesystem or DDI image with dm-verity. - The image's `/usr` tree is overlaid onto the host /usr via overlayfs. The base is the lowerdir, the sysext is added on top. - merge composes the layers. unmerge tears them down. refresh re-evaluates after you add or remove an image. No reboot. - There is a contract that defines what sysexts the system will allow - there are two basic concepts

bex - It is important to understand - this isn't another package manager - it does almost nothing a package manager does except the "deliver bits" part. - None of the things a package manager provides are provided here - depdency resolution, scriplets, etc. - if you need these things you don't need sysext - or learn not to need those things

bex - you wind up with 2 kinds of sysexts, OS Dependent and OS Independent - Depdendent means it relies on specific components of the base OS being in specific places at specific versions - Independent means it brings all of its dependencies with it - The core question you have to ask is, should this lifecycle with the OS or should it be independent of the OS

Daniel - 1. In Headlamp cordon and drain - 2. On an existing Flatcar node, show kubelet version we have - 2a - show loaded sysexts - see docker - 2b - aside on docker then remove it - 3. Show the raw image for the sysext - 4. Show the check for new command (network hit) - 5. Copy in staged update and merge - 6. Restart kubelet and show kubelet version - 7. In Headlamp scale the workload and get a new pod on the system - show the update applied

Daniel - The headline problem is NOT version collisions — those can happen but are rare. The real, everyday pain is that there's no dependency resolution at all. - On an immutable OS there's no package manager on the host. If your binary needs libfoo and it's not in the base /usr, nothing will provide it. You have to vendor it into the sysext yourself. - And it's recursive — you ldd the binary, copy the missing libraries, then ldd those libraries, because dependencies have dependencies. That hand-walking is the tedious part. - The collision case (two sysexts shipping the same so, first merge wins, overlayfs won't detect it) is real but secondary — mention it, don't lead with it. - There is no Requires:/Conflicts: in sysext. That's the deliberate trade for the simpler model — and it's exactly what the next slide's tooling (Flix/Flatwrap) automates.

Daniel - Flix uses patchelf to rewrite RPATHs so binaries only look in their private directory. The bakery's tilde extension uses this. - Flatwrap uses mount namespaces to give each binary its own view of /usr. The bakery's btop extension uses this. - Both work. Both are elegant. Both are barely used. - Why? Because cloud-native software is mostly Go and Rust. Kubernetes, containerd, runc, k3s, rke2, nerdctl — all statically linked. The dynamic-linking problem is mostly a non-problem if your target is the CNCF universe. - The honest takeaway: we built two solutions, and we recommend the third option (static binaries). We have the tools when you need them. You usually don't.

Daniel - Remind what confext is - The mechanism: systemd-sysext --mutable= modes — disabled, auto, enabled, import, ephemeral, ephemeral-import - This is the upstream-contribution arc the talk title promises. - The clever trick: if /var/lib/extensions.mutable/usr is a symlink to /usr, the host /usr becomes the overlay upperdir — writes go straight back to the base. That's how confext works on a mutable /etc. - The detailed paper trail, if anyone asks: Kai Lüke filed Flatcar issue #986 (Mar 2023); Thilo Fromm wrote UAPI Spec PR #78 (merged May 2024); Krzesimir Nowak wrote systemd PR #31000 (merged Feb 2024). All three are Flatcar maintainers — our colleagues, they deserve the credit.

bex - Opt-out ships IN the image. Docker and containerd are not in /usr anymore - Opt-in is built by our CI and downloaded at provisioning time, not baked into the image. - You turn it on with one line in enabled-sysext.conf. - Opt-out + opt-in are both OS-dependent, and both are signed during hte release cycle and the key is created there - Community is the bakery — community-built, not release-tested, self-contained. 28 recipes today.

bex - we have got cool stuff coming up through channels and that has already landed in our beta channel. - I am particularly excited for confext as those of us with long-lived pet-like hosts can have an easier time working with our customizations and playbooks

bex - As you think about using system extensions there are two models that are useful - first, the two axis test - vertical = "does this need to be on the real host or can it run in a container?" - Horizontal = "does this ship in the base OS image or get added per deployment?" - Top-left: SYSTEM AGENTS. Things you may bake into the OS image but could also containerize - Top-right: APPLICATIONS. Stuff users add like workloads and similar infra apps - Bottom-left: BASE OS. This is the core of the OS and reflects the distros core opinion - Bottom-right: SYSEXT. Things you need on the host but which users will have feels about

bex - another model close to home is which tech to choose - These are all great choices depending on your goals and desires - What do you want made easy and what can be hard - rpm-ostree gives you the full distro package set and dependency graph - new stuff, existing image - bootc bakes your customization into the image at build time - new stuff new image - sysext gives you a self-contained add-on with its own update clock. - new stuff new image, but also new stuff same image - a mix of both

bex - 1. In Headlamp point to worker 2 .. let's look at this one. First cordon ... - 2. Login and cat /etc/os-release - oh look Fedora CoreOS - 3. Show that there is a sysext loaded - 4. show kubelet version, then copy staged flatcar update and restart - 5. same sysext same effect ... different OS = 6. Headlamp add workload - 7. Admit we cheated - we had to turn of selinux ... so that's sad. We are thinking about it - 8. But wait, what's worker-3 - Cordon - 9. Also Fedora CoreOS - 10. Using a Fedora CoreOS custom sysext for kubelet - 11. Update and restart kubelet - 12. add workload - show they are all on the same version

- bakery.sh boot is the easiest on-ramp. It spins up a local QEMU VM with a Caddy server in front, Ignition snippet auto-generated, sysext merged. - Three URLs at the bottom are the canonical reading list — Lennart's vision post, the UAPI spec, and the FCOS community sysexts repo. - We will hang out in the hallway and at the Flatcar booth. -- Build Notes: Build PPTX (with speaker notes, recommended for the actual talk): make pptx Build standalone HTML (good for previewing / sharing online): make html Build PDF: make pdf Live preview while editing (auto-reloads on save): make serve

systemd-sysext in Production

What We Learned Extending `/usr` Without a Package Manager

... and now there are 15 standards

Agenda

We built an immutable OS — on purpose

... but we still needed to change some parts on the fly

Our first answer: Torcx

So we adopted a standard: sysext

The mechanics in 30 seconds

What sysext is NOT

OS-dependent vs independent sysexts

`ID=flatcar` + `VERSION_ID`

`ID=_any`

Demo

Kubernetes-as-a-sysext

Issue with self-contained sysexts

Our answer to this

Next problem we hit: confext/sysext was strictly read-only

What we ship today

What we ship also today

The two-axis test

Comparing rpm-ostree, bootc, and sysext

Demo

... but wait, there's more.

Try it today - Thank you

Resources

Try it locally

systemd-sysext in Production

What We Learned Extending /usr Without a Package Manager

... and now there are 15 standards

Agenda

We built an immutable OS — on purpose

... but we still needed to change some parts on the fly

Our first answer: Torcx

So we adopted a standard: sysext

The mechanics in 30 seconds

What sysext is NOT

OS-dependent vs independent sysexts

ID=flatcar + VERSION_ID

ID=_any

Demo

Kubernetes-as-a-sysext

Issue with self-contained sysexts

Our answer to this

Next problem we hit: confext/sysext was strictly read-only

What we ship today

What we ship also today

The two-axis test

Comparing rpm-ostree, bootc, and sysext

Demo

... but wait, there's more.

Try it today - Thank you

Resources

Try it locally

What We Learned Extending `/usr` Without a Package Manager

`ID=flatcar` + `VERSION_ID`

`ID=_any`