-
Notifications
You must be signed in to change notification settings - Fork 326
Add ostree-shutdown.service: hide /sysroot and make /etc read-only #3516
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces shutdown logic for ostree-remount.service
to properly handle /sysroot
and /etc
during system shutdown. The changes involve adding an ExecStop
command to the service file and implementing the corresponding --shutdown
logic in ostree-remount.c
. The approach of detaching /sysroot
and remounting /etc
as read-only is sound and well-documented in the code. My review includes a couple of minor suggestions to improve clarity.
An option I looked at here too that I think would work is overriding the auto-generated But in the end, this service already exists, already has code to remount read-only, already runs very early on startup (and hence late on shutdown) enough to do cleanups, whereas touching the generator felt more invasive. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
Do we still have a writable mount of the rootfs open beneath the composefs even after the remount ro /sysroot ?
|
Yes this is a good question to ask. There's multiple points here. First...note that composefs is a read only case - there's no open writable fds, so it's not going to keep the mount busy itself. I think the real angle we want to look at this from is pretty simple: with e.g. ext4, it will flush the journal and mark the superblock as being "cleanly unmounted" when being truly remounted read-only, so we can check if that's the case. |
Here a little experiment start @cgwalters bpftrace script #3504 (comment)
|
Heh well...that was a whole rabbit hole. It turns out that ext4 in default journal mode, when inspected via Really...the important thing here is the journal. From the filesystem perspective there's no on-disk difference between "clean unmount" and "empty journal". If the composefs overlay mount still holds a reference to a rw mount, that doesn't matter. Or to say it yet another way...in the end we're flushing the filesystem; remounting read-only (and here, detaching the (There's a whole side topic here of whether it'd make sense to change systemd-shutdown to actually freeze like we do for Anyways so...given the question then becomes "do we have outstanding writes in the journal before reboot/poweroff" - as far as my testing goes the answer is "no" - which is as you'd expect. Things would have to pretty badly wrong for us to have a process which is actively writing past where systemd-shutdown's two different invocations of
Yes, but your test scenario isn't accounting for the above - that systemd-shutdown explicitly runs Bottom line then, I plan to merge this patch tomorrow unless there's more followup. |
Tangential: It took some digging but to find out "was an xfs filesystem unmounted cleanly" is apparently best done via |
Actually my small test was not doing the Test units:
Logs
|
src/switchroot/ostree-remount.c
Outdated
{ | ||
// We expect this to be read-only by default on most modern systems, but | ||
// in case it's not, make it read-only now. | ||
do_remount ("/sysroot", false); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ostree/src/switchroot/ostree-remount.c
Lines 61 to 63 in a5a52e0
const bool currently_writable = ((stvfsbuf.f_flag & ST_RDONLY) == 0); | |
if (writable == currently_writable) | |
return; |
We want to unconditionally remount read-only, because the bind mount on top might be read only,
but not the original mount point under it.
In my tests using
mount -o remount,ro
works, so we need a do_remount_force
There is also / (composefs) that prevents /sysroot from being unmounted |
Yes. But how is that different from a traditional mutable system on a single flat filesystem where systemd is running from I guess you're commenting on my original commit message, which I didn't update for recent findings. I'll reword it to strengthen the argument there that this is about the journal. |
src/boot/ostree-remount.service
Outdated
@@ -21,9 +21,11 @@ ConditionKernelCommandLine=ostree | |||
OnFailure=emergency.target | |||
Conflicts=umount.target | |||
# Run after core mounts | |||
After=-.mount var.mount | |||
After=-.mount etc.mount var.mount sysroot.mount |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As remouting read-only /sysroot
also makes /var
read-only, I think we should use a separate unit for shutdown, ordered
Before=var.mount
After=-.mount etc.mount sysroot.mount
so everything depending explicitly on var.mount
can write to it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Making a new unit ups the complexity level a bit. I'm not opposed, but I think we need to weigh it vs the other approach of just trying to make sure sysroot.mount
survives all the way into systemd-shutdown phase which I think is even more correct.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree it would be better to have systemd just skip the umount, but haven't found a way yet.
I've tried to play with x-initrd.mount
in a unit drop-in but it's ignored,
tried to write a full unit with empty requires but the implicit dependencies on the luks volume still causes unmount
Tried mount -o remount,x-initrd.mount /etc
but it's also ignored
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree it would be better to have systemd just skip the umount, but haven't found a way yet.
It works for me to do
cat >/run/systemd/generator/sysroot.mount <<EOF
[Unit]
DefaultDependencies=no
[Mount]
What=/sysroot
Where=/sysroot
Type=bind
EOF
Then there's no attempt to tear it down during the early shutdown phase.
I think that'd all we need do from ostree-system-generator.
However...I do still see
[ 245.612829] (sd-umount)[2620]: Failed to unmount /run/shutdown/mounts/9dcea02aab3b8be8: Device or resource busy
which I haven't traced through but I suspect it's systemd-shutdown trying to unmount /sysroot
, but again it can't.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice, we need the same one for /etc
and we are good to go
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW just tested rpm-ostree install --apply-live tcpdump
after moving the mounts around and it works just fine, ie sysroot.mount
unmount fine
The commands I run in initrd (rd.break=pre-pivot
)
mkdir /tmp-rootfs /tmp-composefs /tmp-etc /tmp-var
mount --move /sysroot/etc /tmp-etc
mount --move /sysroot/sysroot/ostree/deploy/fedora/var /tmp-var
mount --move /sysroot/sysroot /tmp-rootfs
mount --move /sysroot /tmp-composefs
mount --move /tmp-rootfs /sysroot
mount --bind /sysroot /tmp-rootfs
mount --move /tmp-composefs /sysroot
mount --move /tmp-rootfs /sysroot/sysroot
mount --move /tmp-var /sysroot/sysroot/ostree/deploy/fedora/var
mount --move /tmp-etc /sysroot/etc
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The original rootfs would stay at /sysroot, composefs on top of it also at /sysroot, and at /sysroot/sysroot we would have a bind mount of the rootfs, so no visible changes.
I really need a bit more detail about what you're suggesting to change.
Can you clarify: Are you proposing that this blocks this PR or not? If not, can you click "auto merge" please? And then we can take any further investigations to followups.
Pinging you on CNCF slack, Github is not the best chat :)
if you confirm you can actually write to /var
after soft reboot I'm ok to merge this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do the tests try to do any writes after the boot ?
Yes, pretty sure a lot of things would fail if /etc
and /var
were read-only. I also interactively tested a soft reboot with this change just for good measure.
See the logs #3516 (comment), remounting /sysroot readonly remount the underlying rootfs for everyone, /etc and /var included
Hmmmm...I think you're conflating "remounting readonly" with "sync" a bit...but actually there's a further very important piece to understand which is that with bind mounts over a filesystem mounted read-only, remounting them writable still works without affecting the source mount. That's why we have two mount
calls here:
ostree/src/libotcore/otcore-prepare-root.c
Line 466 in da93600
/* Bind-mount /etc (at deploy path), and remount as writable. */ |
So in the soft reboot case, even if the shutdown remounting ro propagated to other mounts, we still force it back writable then.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok remounting rw also remount rw the underlying mount
root@pb14250:~# grep mnt /proc/1/mountinfo
841 75 7:1 / /root/mnt ro,relatime shared:1142 - xfs /dev/loop1 rw,seclabel,attr2,inode64,logbufs=8,logbsize=32k,noquota
root@pb14250:~# mount -o remount,ro mnt
root@pb14250:~# grep mnt /proc/1/mountinfo
841 75 7:1 / /root/mnt ro,relatime shared:1142 - xfs /dev/loop1 ro,seclabel,attr2,inode64,logbufs=8,logbsize=32k,noquota
root@pb14250:~# mount -o remount,rw mnt
root@pb14250:~# grep mnt /proc/1/mountinfo
841 75 7:1 / /root/mnt rw,relatime shared:1142 - xfs /dev/loop1 rw,seclabel,attr2,inode64,logbufs=8,logbsize=32k,noquota
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok remounting rw also remount rw the underlying mount
Right...the way I understand things here is that the mounts (including ro/rw) are VFS level concept, but are references to an underlying filesystem. When a request comes in to mount writable, that "reaches through" to to the underlying fs (if it can be mounted writable).
On a system without SELinux
yup, I really hate github (and gitlab) for not providing a way to comment on the commit message |
If we wanted to do that, it's supported via https://www.freedesktop.org/software/systemd/man/latest/bootup.html#The%20exitrd But again though...someone would need to come up with a counter-argument to my argument that what matters is just mounting read-only and flushing outstanding writes (i.e. the journal is clean), not that we actually need to tear down the in-memory VFS structures - because on reboot you can't tell the difference. |
I don't think we want to bother with the exitrd, remount ro is enough. I just wanted you when you reword the commit to mention that we can't unmount /sysroot also because it's interlocked with /, with the current wording one can think |
b87ca47
to
8be6546
Compare
OK now redone to add a new service |
See my latest comment (#3516 (comment)), I think if we change slightly the mount we can keep /etc and /sysroot mounted till systemd-shutdown |
We have a lot of bind mounts; these are usually set up in the initramfs. So far during shutdown we've let systemd just try to sort things out via auto-generated mount units i.e. `sysroot.mount` and `etc.mount` and so on. systemd has some special casing for `-.mount` (i.e. `/`) and `etc.mount` https://github.com/systemd/systemd/blob/e91bfad241799b449df73efc30d833b9c5937001/src/shared/fstab-util.c#L72 However it doesn't special case `/sysroot` - which is currently an ostree-specific invention (when used in the real root). We cannot actually unmount `/sysroot` while it's in use, and it is because `/etc` is a bind mount into it. And we can't tear down `/etc` because it's just expected that e.g. pid 1 and other things hold open references to it - until things finally transition into systemd-shutdown. What we can do though is explicitly detach it during the shutdown phase; this ensures that systemd won't try to clean it up then, suppressing errors about its inability to do so. While we're here, let's also remount `/etc` read-only; while systemd itself will try to do so during systemd-shutdown. Per comments if this service fails, it's a bug in something else to be fixed. Closes: ostreedev#3513 Signed-off-by: Colin Walters <walters@verbum.org>
8be6546
to
d0c454c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This fixes sysroot umount and hopefully doesn't break anything we could think of, LGTM
Was thinking about something similar too. I wonder even if basically any mount systemd inherited from the initrd should just be auto-tagged with that. (I.e. this would be a systemd fix.) |
See #3503 (comment) |
We have a lot of bind mounts; these are usually set up in the initramfs.
So far during shutdown we've let systemd just try to sort things out
via auto-generated mount units i.e.
sysroot.mount
andetc.mount
and so on.
systemd has some special casing for
-.mount
(i.e./
) andetc.mount
https://github.com/systemd/systemd/blob/e91bfad241799b449df73efc30d833b9c5937001/src/shared/fstab-util.c#L72
However it doesn't special case
/sysroot
- which is currentlyan ostree-specific invention (when used in the real root).
We cannot actually unmount
/sysroot
while it's in use, and itis because
/etc
is a bind mount into it. And we can't teardown
/etc
because it's just expected that e.g. pid 1 and otherthings hold open references to it - until things finally
transition into systemd-shutdown.
What we can do though is explicitly detach it during the shutdown
phase; this ensures that systemd won't try to clean it up then,
suppressing errors about its inability to do so.
While we're here, let's also remount
/etc
read-only; whilesystemd itself will try to do so during systemd-shutdown.
Per comments if this service fails, it's a bug in something
else to be fixed.
Closes: #3513
Signed-off-by: Colin Walters walters@verbum.org