Skip to content

Conversation

cgwalters
Copy link
Member

@cgwalters cgwalters commented Aug 28, 2025

We have a lot of bind mounts; these are usually set up in the initramfs.
So far during shutdown we've let systemd just try to sort things out
via auto-generated mount units i.e. sysroot.mount and etc.mount
and so on.

systemd has some special casing for -.mount (i.e. /) and etc.mount
https://github.com/systemd/systemd/blob/e91bfad241799b449df73efc30d833b9c5937001/src/shared/fstab-util.c#L72

However it doesn't special case /sysroot - which is currently
an ostree-specific invention (when used in the real root).
We cannot actually unmount /sysroot while it's in use, and it
is because /etc is a bind mount into it. And we can't tear
down /etc because it's just expected that e.g. pid 1 and other
things hold open references to it - until things finally
transition into systemd-shutdown.

What we can do though is explicitly detach it during the shutdown
phase; this ensures that systemd won't try to clean it up then,
suppressing errors about its inability to do so.

While we're here, let's also remount /etc read-only; while
systemd itself will try to do so during systemd-shutdown.
Per comments if this service fails, it's a bug in something
else to be fixed.

Closes: #3513
Signed-off-by: Colin Walters walters@verbum.org


@github-actions github-actions bot added the area/prepare-root Issue relates to ostree-prepare-root label Aug 28, 2025
Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces shutdown logic for ostree-remount.service to properly handle /sysroot and /etc during system shutdown. The changes involve adding an ExecStop command to the service file and implementing the corresponding --shutdown logic in ostree-remount.c. The approach of detaching /sysroot and remounting /etc as read-only is sound and well-documented in the code. My review includes a couple of minor suggestions to improve clarity.

@cgwalters
Copy link
Member Author

cgwalters commented Aug 28, 2025

An option I looked at here too that I think would work is overriding the auto-generated sysroot.mount with one that basically treats it the same as etc.mount - i.e. suppressing all default dependencies (including Before=local-fs.target) which is I think the main reason systemd is trying to unmount it during regular unit shutdown.

But in the end, this service already exists, already has code to remount read-only, already runs very early on startup (and hence late on shutdown) enough to do cleanups, whereas touching the generator felt more invasive.

Copy link
Member

@jmarrero jmarrero left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@champtar
Copy link
Collaborator

Do we still have a writable mount of the rootfs open beneath the composefs even after the remount ro /sysroot ?
On the kernel cmdline we have rw, so I'm wondering if remounting /sysroot ro and/or detaching really flushes anything.

72 78 0:35 /root /sysroot ro,relatime shared:3 - btrfs /dev/mapper/luks-38d7d9c8-67d1-42c4-b3d4-58b3e1ee94e1 rw,seclabel,discard=async,space_cache=v2,subvolid=258,subvol=/root
78 2 0:39 / / ro,relatime shared:1 - overlay composefs ro,seclabel,lowerdir+=/run/ostree/.private/cfsroot-lower,datadir+=/sysroot/ostree/repo/objects,redirect_dir=on,metacopy=on
...

@cgwalters
Copy link
Member Author

Do we still have a writable mount of the rootfs open beneath the composefs even after the remount ro /sysroot ?
On the kernel cmdline we have rw, so I'm wondering if remounting /sysroot ro and/or detaching really flushes anything.

Yes this is a good question to ask. There's multiple points here. First...note that composefs is a read only case - there's no open writable fds, so it's not going to keep the mount busy itself.

I think the real angle we want to look at this from is pretty simple: with e.g. ext4, it will flush the journal and mark the superblock as being "cleanly unmounted" when being truly remounted read-only, so we can check if that's the case.

@champtar
Copy link
Collaborator

champtar commented Aug 28, 2025

Here a little experiment

start @cgwalters bpftrace script #3504 (comment)

truncate -s10G rootfs
mkfs.xfs rootfs
mkdir mnt
mount rootfs mnt
# this is how we remount read only /sysroot (https://github.com/ostreedev/ostree/blob/a5a52e01edd565cca368b946d5e5e4a333b3f350/src/switchroot/ostree-prepare-root.c#L407)
mount --bind -o remount,ro mnt
grep mnt /proc/self/mountinfo
# we see the super option is rw
# 1623 75 7:1 / /root/mnt ro,relatime shared:1147 - xfs /dev/loop1 rw,seclabel,attr2,inode64,logbufs=8,logbsize=32k,noquota
# open another shell, cd into mnt
umount mnt
# this fails / "umount: /root/mnt: target is busy."
umount -l mnt
# no call to syncfs until the shell exit the dir
mount rootfs mnt
# open another shell, cd into mnt
mount -o remount,ro mnt
# we see a call to sync_fs right away
grep mnt /proc/self/mountinfo
# we see the super options is also ro
# 1622 75 7:1 / /root/mnt ro,relatime shared:1147 - xfs /dev/loop1 ro,seclabel,attr2,inode64,logbufs=8,logbsize=32k,noquota

@cgwalters cgwalters marked this pull request as draft August 28, 2025 21:08
@cgwalters
Copy link
Member Author

I think the real angle we want to look at this from is pretty simple: with e.g. ext4, it will flush the journal and mark the superblock as being "cleanly unmounted" when being truly remounted read-only, so we can check if that's the case.

Heh well...that was a whole rabbit hole. It turns out that ext4 in default journal mode, when inspected via tune2fs -l always shows the filesystem as clean.

Really...the important thing here is the journal. From the filesystem perspective there's no on-disk difference between "clean unmount" and "empty journal". If the composefs overlay mount still holds a reference to a rw mount, that doesn't matter.

Or to say it yet another way...in the end we're flushing the filesystem; remounting read-only (and here, detaching the /sysroot mount which was already read-only) is just part of ensuring processes don't keep writing to it.

(There's a whole side topic here of whether it'd make sense to change systemd-shutdown to actually freeze like we do for /boot because of xfs and journal...if we did that it'd bypass this whole "did we remount every mountpoint"; instead we just think if we froze each filesystem instance, regardless of the mount point)


Anyways so...given the question then becomes "do we have outstanding writes in the journal before reboot/poweroff" - as far as my testing goes the answer is "no" - which is as you'd expect. Things would have to pretty badly wrong for us to have a process which is actively writing past where systemd-shutdown's two different invocations of sync_with_progress() actually fail.


start @cgwalters bpftrace script #3504 (comment)

Yes, but your test scenario isn't accounting for the above - that systemd-shutdown explicitly runs sync() which affects all mount points. So it's again really not so much about "did we successfully remount read-only or unmount".

Bottom line then, I plan to merge this patch tomorrow unless there's more followup.

@cgwalters cgwalters marked this pull request as ready for review August 29, 2025 01:08
@cgwalters
Copy link
Member Author

Tangential: It took some digging but to find out "was an xfs filesystem unmounted cleanly" is apparently best done via xfs_logprint /dev/sda4 and verifying that the last log operation is e.g. Oper (0): tid: 704c3ae5 len: 0 clientid: LOG flags: UNMOUNT .

@champtar
Copy link
Collaborator

Actually my small test was not doing the mount -o remount,ro mnt ...,
it does remount the underlying mount and sync_fs

Test units:

cat > /run/sync.trace <<'EOF'
#!/usr/bin/bpftrace
#include <linux/fs.h>

tracepoint:btrfs:btrfs_sync_fs
{
  printf("btrfs sync impl: comm=%-16s pid=%-7d\n", comm, pid);
}
EOF
chmod +x /run/sync.trace

setenforce 0

mkdir -p /run/systemd/system/
cat > /run/systemd/system/trace.service <<'EOS'
[Unit]
DefaultDependencies=no
Before=final.target
Conflicts=final.target

[Service]
Type=simple
StandardOutput=journal+console
ExecStart=/run/sync.trace
KillSignal=SIGCONT
TimeoutStopSec=1m
EOS
systemctl daemon-reload
systemctl start trace.service

mkdir -p /run/systemd/system/ostree-remount.service.d
cat > /run/systemd/system/ostree-remount.service.d/test.conf <<'EOS'
[Unit]
After=-.mount etc.mount var.mount sysroot.mount

[Service]
StandardOutput=journal+console
ExecStop=/usr/bin/grep -E '/sysroot|/etc|/var' /proc/self/mountinfo
ExecStop=/usr/bin/mount -v -o remount,ro /sysroot
ExecStop=/usr/bin/grep -E '/sysroot|/etc|/var' /proc/self/mountinfo
ExecStop=/usr/bin/sleep 1
ExecStop=/usr/bin/umount -v -l /sysroot
ExecStop=/usr/bin/grep -E '/sysroot|/etc|/var' /proc/self/mountinfo
ExecStop=/usr/bin/sleep 1
ExecStop=/usr/bin/mount -v -o remount,ro /etc
ExecStop=/usr/bin/grep -E '/sysroot|/etc|/var' /proc/self/mountinfo
ExecStop=/usr/bin/sleep 1
EOS
systemctl daemon-reload

Logs

[  OK  ] Removed slice system-systemd\x2dfs…slice - Slice /system/systemd-fsck.
[ 1130.367926] grep[2880]: 73 79 0:35 /root /sysroot ro,relatime shared:3 - btrfs /dev/mapper/luks-38d7d9c8-67d1-42c4-b3d4-58b3e1ee94e1 rw,seclabel,discard=async,space_cache=v2,subvolid=258,subvol=/root
[ 1130.368018] grep[2880]: 79 2 0:39 / / ro,relatime shared:1 - overlay composefs ro,seclabel,lowerdir+=/run/ostree/.private/cfsroot-lower,datadir+=/sysroot/ostree/repo/objects,redirect_dir=on,metacopy=on
[ 1130.368051] grep[2880]: 76 79 0:35 /root/ostree/deploy/fedora/deploy/fafd1f1fff71690b088440d2dc42ad4f3b89dc05c5f8de6006dd82c24ede4a13.0/etc /etc rw,relatime shared:2 - btrfs /dev/mapper/luks-38d7d9c8-67d1-42c4-b3d4-58b3e1ee94e1 rw,seclabel,discard=async,space_cache=v2,subvolid=258,subvol=/root
[ 1130.368078] grep[2880]: 52 79 0:35 /var /var rw,relatime shared:116 - btrfs /dev/mapper/luks-38d7d9c8-67d1-42c4-b3d4-58b3e1ee94e1 rw,seclabel,discard=async,space_cache=v2,subvolid=256,subvol=/var
[ 1130.382115] sync.trace[2569]: btrfs sync impl: comm=mount            pid=2882
[ 1130.382266] sync.trace[2569]: btrfs sync impl: comm=mount            pid=2882
[ 1130.600572] mount[2882]: mount: /dev/mapper/luks-38d7d9c8-67d1-42c4-b3d4-58b3e1ee94e1 mounted on /sysroot.
[ 1130.622906] grep[2883]: 73 79 0:35 /root /sysroot ro,relatime shared:3 - btrfs /dev/mapper/luks-38d7d9c8-67d1-42c4-b3d4-58b3e1ee94e1 ro,seclabel,discard=async,space_cache=v2,subvolid=258,subvol=/root
[ 1130.622980] grep[2883]: 79 2 0:39 / / ro,relatime shared:1 - overlay composefs ro,seclabel,lowerdir+=/run/ostree/.private/cfsroot-lower,datadir+=/sysroot/ostree/repo/objects,redirect_dir=on,metacopy=on
[ 1130.623331] grep[2883]: 76 79 0:35 /root/ostree/deploy/fedora/deploy/fafd1f1fff71690b088440d2dc42ad4f3b89dc05c5f8de6006dd82c24ede4a13.0/etc /etc rw,relatime shared:2 - btrfs /dev/mapper/luks-38d7d9c8-67d1-42c4-b3d4-58b3e1ee94e1 ro,seclabel,discard=async,space_cache=v2,subvolid=258,subvol=/root
[ 1130.623487] grep[2883]: 52 79 0:35 /var /var rw,relatime shared:116 - btrfs /dev/mapper/luks-38d7d9c8-67d1-42c4-b3d4-58b3e1ee94e1 ro,seclabel,discard=async,space_cache=v2,subvolid=256,subvol=/var
[ 1131.656932] umount[2888]: umount: /sysroot (/dev/mapper/luks-38d7d9c8-67d1-42c4-b3d4-58b3e1ee94e1) unmounted
[  OK  ] Unmounted sysroot.mount - /sysroot.
[ 1131.662635] grep[2889]: 79 2 0:39 / / ro,relatime shared:1 - overlay composefs ro,seclabel,lowerdir+=/run/ostree/.private/cfsroot-lower,datadir+=/sysroot/ostree/repo/objects,redirect_dir=on,metacopy=on
[ 1131.662693] grep[2889]: 76 79 0:35 /root/ostree/deploy/fedora/deploy/fafd1f1fff71690b088440d2dc42ad4f3b89dc05c5f8de6006dd82c24ede4a13.0/etc /etc rw,relatime shared:2 - btrfs /dev/mapper/luks-38d7d9c8-67d1-42c4-b3d4-58b3e1ee94e1 ro,seclabel,discard=async,space_cache=v2,subvolid=258,subvol=/root
[ 1131.662721] grep[2889]: 52 79 0:35 /var /var rw,relatime shared:116 - btrfs /dev/mapper/luks-38d7d9c8-67d1-42c4-b3d4-58b3e1ee94e1 ro,seclabel,discard=async,space_cache=v2,subvolid=256,subvol=/var
[ 1132.680245] mount[2894]: mount: /dev/mapper/luks-38d7d9c8-67d1-42c4-b3d4-58b3e1ee94e1 mounted on /etc.
[ 1132.685649] grep[2895]: 79 2 0:39 / / ro,relatime shared:1 - overlay composefs ro,seclabel,lowerdir+=/run/ostree/.private/cfsroot-lower,datadir+=/sysroot/ostree/repo/objects,redirect_dir=on,metacopy=on
[ 1132.685717] grep[2895]: 76 79 0:35 /root/ostree/deploy/fedora/deploy/fafd1f1fff71690b088440d2dc42ad4f3b89dc05c5f8de6006dd82c24ede4a13.0/etc /etc ro,relatime shared:2 - btrfs /dev/mapper/luks-38d7d9c8-67d1-42c4-b3d4-58b3e1ee94e1 ro,seclabel,discard=async,space_cache=v2,subvolid=258,subvol=/root
[ 1132.685759] grep[2895]: 52 79 0:35 /var /var rw,relatime shared:116 - btrfs /dev/mapper/luks-38d7d9c8-67d1-42c4-b3d4-58b3e1ee94e1 ro,seclabel,discard=async,space_cache=v2,subvolid=256,subvol=/var
[  OK  ] Stopped ostree-remount.service - OSTree Remount OS/ Bind Mounts.
         Unmounting etc.mount...
         Unmounting var.mount - /var...
[FAILED] Failed unmounting etc.mount.
[  OK  ] Unmounted var.mount - /var.
[  OK  ] Stopped target blockdev@dev-mapper…d7d9c8-67d1-42c4-b3d4-58b3e1ee94e1.

{
// We expect this to be read-only by default on most modern systems, but
// in case it's not, make it read-only now.
do_remount ("/sysroot", false);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

const bool currently_writable = ((stvfsbuf.f_flag & ST_RDONLY) == 0);
if (writable == currently_writable)
return;

We want to unconditionally remount read-only, because the bind mount on top might be read only,
but not the original mount point under it.
In my tests using mount -o remount,ro works, so we need a do_remount_force

@champtar
Copy link
Collaborator

We cannot actually unmount /sysroot while it's in use, and it is because /etc is a bind mount into it.

There is also / (composefs) that prevents /sysroot from being unmounted

@cgwalters
Copy link
Member Author

There is also / (composefs) that prevents /sysroot from being unmounted

Yes. But how is that different from a traditional mutable system on a single flat filesystem where systemd is running from / that it also doesn't unmount?

I guess you're commenting on my original commit message, which I didn't update for recent findings. I'll reword it to strengthen the argument there that this is about the journal.

@@ -21,9 +21,11 @@ ConditionKernelCommandLine=ostree
OnFailure=emergency.target
Conflicts=umount.target
# Run after core mounts
After=-.mount var.mount
After=-.mount etc.mount var.mount sysroot.mount
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As remouting read-only /sysroot also makes /var read-only, I think we should use a separate unit for shutdown, ordered

Before=var.mount
After=-.mount etc.mount sysroot.mount 

so everything depending explicitly on var.mount can write to it

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Making a new unit ups the complexity level a bit. I'm not opposed, but I think we need to weigh it vs the other approach of just trying to make sure sysroot.mount survives all the way into systemd-shutdown phase which I think is even more correct.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree it would be better to have systemd just skip the umount, but haven't found a way yet.
I've tried to play with x-initrd.mount in a unit drop-in but it's ignored,
tried to write a full unit with empty requires but the implicit dependencies on the luks volume still causes unmount
Tried mount -o remount,x-initrd.mount /etc but it's also ignored

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree it would be better to have systemd just skip the umount, but haven't found a way yet.

It works for me to do

cat >/run/systemd/generator/sysroot.mount <<EOF
[Unit]
DefaultDependencies=no
[Mount]
What=/sysroot
Where=/sysroot
Type=bind
EOF

Then there's no attempt to tear it down during the early shutdown phase.

I think that'd all we need do from ostree-system-generator.

However...I do still see
[ 245.612829] (sd-umount)[2620]: Failed to unmount /run/shutdown/mounts/9dcea02aab3b8be8: Device or resource busy

which I haven't traced through but I suspect it's systemd-shutdown trying to unmount /sysroot, but again it can't.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, we need the same one for /etc and we are good to go

Copy link
Collaborator

@champtar champtar Aug 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW just tested rpm-ostree install --apply-live tcpdump after moving the mounts around and it works just fine, ie sysroot.mount unmount fine
The commands I run in initrd (rd.break=pre-pivot)

mkdir /tmp-rootfs /tmp-composefs /tmp-etc /tmp-var
mount --move /sysroot/etc /tmp-etc
mount --move /sysroot/sysroot/ostree/deploy/fedora/var /tmp-var
mount --move /sysroot/sysroot /tmp-rootfs
mount --move /sysroot /tmp-composefs

mount --move /tmp-rootfs /sysroot
mount --bind /sysroot /tmp-rootfs

mount --move /tmp-composefs /sysroot
mount --move /tmp-rootfs /sysroot/sysroot
mount --move /tmp-var /sysroot/sysroot/ostree/deploy/fedora/var
mount --move /tmp-etc /sysroot/etc

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The original rootfs would stay at /sysroot, composefs on top of it also at /sysroot, and at /sysroot/sysroot we would have a bind mount of the rootfs, so no visible changes.

I really need a bit more detail about what you're suggesting to change.

Can you clarify: Are you proposing that this blocks this PR or not? If not, can you click "auto merge" please? And then we can take any further investigations to followups.

Pinging you on CNCF slack, Github is not the best chat :)
if you confirm you can actually write to /var after soft reboot I'm ok to merge this

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do the tests try to do any writes after the boot ?

Yes, pretty sure a lot of things would fail if /etc and /var were read-only. I also interactively tested a soft reboot with this change just for good measure.

See the logs #3516 (comment), remounting /sysroot readonly remount the underlying rootfs for everyone, /etc and /var included

Hmmmm...I think you're conflating "remounting readonly" with "sync" a bit...but actually there's a further very important piece to understand which is that with bind mounts over a filesystem mounted read-only, remounting them writable still works without affecting the source mount. That's why we have two mount calls here:

/* Bind-mount /etc (at deploy path), and remount as writable. */

So in the soft reboot case, even if the shutdown remounting ro propagated to other mounts, we still force it back writable then.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok remounting rw also remount rw the underlying mount

root@pb14250:~# grep mnt /proc/1/mountinfo
841 75 7:1 / /root/mnt ro,relatime shared:1142 - xfs /dev/loop1 rw,seclabel,attr2,inode64,logbufs=8,logbsize=32k,noquota
root@pb14250:~# mount -o remount,ro mnt
root@pb14250:~# grep mnt /proc/1/mountinfo
841 75 7:1 / /root/mnt ro,relatime shared:1142 - xfs /dev/loop1 ro,seclabel,attr2,inode64,logbufs=8,logbsize=32k,noquota
root@pb14250:~# mount -o remount,rw mnt
root@pb14250:~# grep mnt /proc/1/mountinfo
841 75 7:1 / /root/mnt rw,relatime shared:1142 - xfs /dev/loop1 rw,seclabel,attr2,inode64,logbufs=8,logbsize=32k,noquota

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok remounting rw also remount rw the underlying mount

Right...the way I understand things here is that the mounts (including ro/rw) are VFS level concept, but are references to an underlying filesystem. When a request comes in to mount writable, that "reaches through" to to the underlying fs (if it can be mounted writable).

@cgwalters cgwalters marked this pull request as draft August 29, 2025 13:18
@champtar
Copy link
Collaborator

But how is that different from a traditional mutable system on a single flat filesystem where systemd is running from / that it also doesn't unmount?

On a system without SELinux /etc might unmount just fine (which might be a problem), but / and /sysroot are really interlocked, you need to move /sysroot out of / and then unmount /, so you really need to pivot_root

I guess you're commenting on my original commit message, which I didn't update for recent findings. I'll reword it to strengthen the argument there that this is about the journal.

yup, I really hate github (and gitlab) for not providing a way to comment on the commit message

@cgwalters
Copy link
Member Author

On a system without SELinux /etc might unmount just fine (which might be a problem), but / and /sysroot are really interlocked, you need to move /sysroot out of / and then unmount /, so you really need to pivot_root

If we wanted to do that, it's supported via https://www.freedesktop.org/software/systemd/man/latest/bootup.html#The%20exitrd

But again though...someone would need to come up with a counter-argument to my argument that what matters is just mounting read-only and flushing outstanding writes (i.e. the journal is clean), not that we actually need to tear down the in-memory VFS structures - because on reboot you can't tell the difference.

@champtar
Copy link
Collaborator

On a system without SELinux /etc might unmount just fine (which might be a problem), but / and /sysroot are really interlocked, you need to move /sysroot out of / and then unmount /, so you really need to pivot_root

If we wanted to do that, it's supported via https://www.freedesktop.org/software/systemd/man/latest/bootup.html#The%20exitrd

But again though...someone would need to come up with a counter-argument to my argument that what matters is just mounting read-only and flushing outstanding writes (i.e. the journal is clean), not that we actually need to tear down the in-memory VFS structures - because on reboot you can't tell the difference.

I don't think we want to bother with the exitrd, remount ro is enough.

I just wanted you when you reword the commit to mention that we can't unmount /sysroot also because it's interlocked with /, with the current wording one can think /etc is the only blocker.

@cgwalters cgwalters changed the title ostree-remount: Clean up /sysroot and make /etc read-only Add ostree-shutdown.service: hide /sysroot and make /etc read-only Aug 29, 2025
@cgwalters cgwalters marked this pull request as ready for review August 29, 2025 19:46
@cgwalters
Copy link
Member Author

OK now redone to add a new service

@champtar
Copy link
Collaborator

OK now redone to add a new service

See my latest comment (#3516 (comment)), I think if we change slightly the mount we can keep /etc and /sysroot mounted till systemd-shutdown

We have a lot of bind mounts; these are usually set up in the initramfs.
So far during shutdown we've let systemd just try to sort things out
via auto-generated mount units i.e. `sysroot.mount` and `etc.mount`
and so on.

systemd has some special casing for `-.mount` (i.e. `/`) and `etc.mount`
https://github.com/systemd/systemd/blob/e91bfad241799b449df73efc30d833b9c5937001/src/shared/fstab-util.c#L72

However it doesn't special case `/sysroot` - which is currently
an ostree-specific invention (when used in the real root).
We cannot actually unmount `/sysroot` while it's in use, and it
is because `/etc` is a bind mount into it. And we can't tear
down `/etc` because it's just expected that e.g. pid 1 and other
things hold open references to it - until things finally
transition into systemd-shutdown.

What we can do though is explicitly detach it during the shutdown
phase; this ensures that systemd won't try to clean it up then,
suppressing errors about its inability to do so.

While we're here, let's also remount `/etc` read-only; while
systemd itself will try to do so during systemd-shutdown.
Per comments if this service fails, it's a bug in something
else to be fixed.

Closes: ostreedev#3513
Signed-off-by: Colin Walters <walters@verbum.org>
@cgwalters cgwalters enabled auto-merge August 29, 2025 21:05
@cgwalters cgwalters disabled auto-merge August 29, 2025 21:05
Copy link
Collaborator

@champtar champtar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This fixes sysroot umount and hopefully doesn't break anything we could think of, LGTM

@champtar champtar enabled auto-merge August 29, 2025 21:50
@champtar champtar merged commit 7c1d412 into ostreedev:main Aug 29, 2025
26 checks passed
@jlebon
Copy link
Member

jlebon commented Sep 3, 2025

Tried mount -o remount,x-initrd.mount /etc but it's also ignored

Was thinking about something similar too. I wonder even if basically any mount systemd inherited from the initrd should just be auto-tagged with that. (I.e. this would be a systemd fix.)

@champtar
Copy link
Collaborator

champtar commented Sep 3, 2025

Tried mount -o remount,x-initrd.mount /etc but it's also ignored

Was thinking about something similar too. I wonder even if basically any mount systemd inherited from the initrd should just be auto-tagged with that. (I.e. this would be a systemd fix.)

See #3503 (comment)
x-initrd.attach in /etc/crypttab fixes the etc.mount failure

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/prepare-root Issue relates to ostree-prepare-root
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Fix unmount of /etc and /sysroot
4 participants