Skip to content

Conversation

ricardosalveti
Copy link
Contributor

Bootloader entry should only have aboot and abootcfg configuration entries when aboot.img is found on the system (e.g. /usr/lib/modules/$kver).

Otherwise it will be always set, won't be used during boot and systemd-boot will complain about unknown lines.

Copy link

openshift-ci bot commented Apr 15, 2025

Hi @ricardosalveti. Thanks for your PR.

I'm waiting for a ostreedev member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@ricardosalveti
Copy link
Contributor Author

@ericcurtin at https://github.com/ostreedev/ostree/blob/main/src/libostree/ostree-sysroot-deploy.c#L1201 we are checking for availability of aboot.img and aboot.cfg under /usr/lib/modules/$kver, but when creating the boot entry it sets to the correct path once found and to /usr/lib/ostree-boot/aboot.img / cfg when not found, causing it to be added for all ostree users.

Since I don't have the aboot logic background, should it really be pointing to /usr/lib/ostree-boot even when not available?

Noticed this when booting ostree with systemd-boot, as it complained about the 'aboot' and 'abootcfg' entries:

root@jetson-agx-xavier-devkit:/boot# bootctl
systemd-boot not installed in ESP.
/boot/loader/entries/ostree-2.conf:8: Unknown line 'abootcfg', ignoring.
/boot/loader/entries/ostree-2.conf:9: Unknown line 'aboot', ignoring.
/boot/loader/entries/ostree-1+3.conf:8: Unknown line 'abootcfg', ignoring.
/boot/loader/entries/ostree-1+3.conf:9: Unknown line 'aboot', ignoring.

@cgwalters
Copy link
Member

/ok-to-test

@cgwalters
Copy link
Member

cc @ericcurtin who added aboot

@ericcurtin
Copy link
Collaborator

This would require a test to be sure... There's some not obvious paths in this code as at osbuild-time OSTree when built for aboot behaves a little differently to say when actually booted...

@ericcurtin
Copy link
Collaborator

It's a pity we don't have Packit integrated with OSTree, would be neat to grab an rpm built from this PR and run a quick test.

@ericcurtin
Copy link
Collaborator

@ricardosalveti mind if we take a little while to test this change?

@ericcurtin
Copy link
Collaborator

Might even be worth adding CI for this platform now. There's more variety of ARM runners in GitHub now than there was when this code was added.

@ricardosalveti
Copy link
Contributor Author

@ricardosalveti mind if we take a little while to test this change?

not at all, I guess one thing to confirm is if /usr/lib/ostree-boot/ was indeed being used in the aboot scenario.

@masneyb
Copy link

masneyb commented Apr 23, 2025

I tested this change on a Qualcomm RideSX4 (sa8775p) with AutoSD 9, and everything looks good to me. @cgwalters @ericcurtin : I am fine with merging this.

Tested-by: Brian Masney bmasney@redhat.com

@ricardosalveti
Copy link
Contributor Author

I tested this change on a Qualcomm RideSX4 (sa8775p) with AutoSD 9, and everything looks good to me. @cgwalters @ericcurtin : I am fine with merging this.

Thanks for testing this, tested-by tag added and commit updated.

@ericcurtin
Copy link
Collaborator

ericcurtin commented Apr 23, 2025

@masneyb curious what tested means here? Build an image with modified ostree rpm using osbuild? AutoSD booted with modified rpm? AutoSD and OSTree upgraded and booted into the new image version successfully with modified rpm?

Just want to be sure, so if this comes back to bite us we know what was tested :)

Thanks for helping out.

@ericcurtin
Copy link
Collaborator

I would not block on this, but if we break, we have to push a new OSTree rpm to CentOS Stream, not fun.

@ericcurtin
Copy link
Collaborator

ericcurtin commented Apr 23, 2025

Happy to merge... I'm wary of these changes because some of these code paths behave differently at osbuild-time, first-boot time and upgrade time...

@ericcurtin ericcurtin enabled auto-merge April 23, 2025 23:22
@ericcurtin
Copy link
Collaborator

I guess continuous-integration/jenkins/pr-merge is a flake maybe?

@masneyb
Copy link

masneyb commented Apr 24, 2025

@ericcurtin : I tested this with the following procedure:

  • I checked out this branch and built new ostree RPMs from dist-git with a newer release number.
  • I built a new image using a-i-b minimal image mode, and I pointed it to the local directory where I built these RPMs. I checked the build output (and eventually the system image) to be sure that my new RPM version was installed as expected.
  • I boot tested this on the RideSX4 board.

So I guess the only scenario that I didn't test was the upgrade functionality. Suggestions welcome here for how I can test that.

Is there anything else that I should have tested? Tomorrow I also want to test this on a Renesas board that we have.

@masneyb
Copy link

masneyb commented Apr 24, 2025

Regarding the CI failure:

[2025-04-23T22:03:56.434Z] Apr 23 22:03:55 qemu0 kola-runext-itest-bare-root.sh[2227]: Unexpected nonzero exit status 1 while running: ostree checkout -H ${host_refspec} co
[2025-04-23T22:03:56.434Z] Apr 23 22:03:55 qemu0 systemd[1]: kola-runext.service: Main process exited, code=exited, status=1/FAILURE
[2025-04-23T22:03:56.434Z] Apr 23 22:03:55 qemu0 systemd[1]: kola-runext.service: Failed with result 'exit-code'.
[2025-04-23T22:03:59.683Z] --- FAIL: ext.ostree.destructive.itest-bare-root.sh (27.07s)
[2025-04-23T22:03:59.683Z]         cluster.go:151: Error: Unit kola-runext.service exited with code 1
[2025-04-23T22:03:59.683Z]         cluster.go:151: 2025-04-23T22:03:56Z cli: Unit kola-runext.service exited with code 1
[2025-04-23T22:03:59.683Z]         harness.go:1260: kolet failed: : kolet run-test-unit failed: Process exited with status 1

and

[2025-04-23T22:08:48.610Z] --- FAIL: ext.ostree.destructive.itest-label-selinux.sh (36.99s)
[2025-04-23T22:08:48.610Z]         cluster.go:151: Error: Unit kola-runext.service exited with code 1
[2025-04-23T22:08:48.610Z]         cluster.go:151: 2025-04-23T22:08:44Z cli: Unit kola-runext.service exited with code 1
[2025-04-23T22:08:48.610Z]         harness.go:1260: kolet failed: : kolet run-test-unit failed: Process exited with status 1

Along with various other errors related to kola-runext.service .

@ericcurtin
Copy link
Collaborator

Maybe @jlebon knows how to kick off the tests again?

@ericcurtin
Copy link
Collaborator

@ericcurtin : I tested this with the following procedure:

  • I checked out this branch and built new ostree RPMs from dist-git with a newer release number.
  • I built a new image using a-i-b minimal image mode, and I pointed it to the local directory where I built these RPMs. I checked the build output (and eventually the system image) to be sure that my new RPM version was installed as expected.
  • I boot tested this on the RideSX4 board.

So I guess the only scenario that I didn't test was the upgrade functionality. Suggestions welcome here for how I can test that.

Is there anything else that I should have tested? Tomorrow I also want to test this on a Renesas board that we have.

It's documented here:

https://sigs.centos.org/automotive/building/unattended_updates/

It's possible to kinda do a fake upgrade, if all the various composefs/fsverity verifications are turned off and we do a "rpm-ostree install some rpm". An rpm-ostree install is kinda like an atomic upgrade.

We have done the two most important tests though maybe it's enough, thanks for writing that down.

@ericcurtin
Copy link
Collaborator

Will be neat to see this on the Renasas board, we are missing the userspace AB switching tool for the RideSX4 board. We would fake the AB switching sometimes to test this on RideSX4 or do AA switching (all on the same slot 😅)

@masneyb
Copy link

masneyb commented Apr 29, 2025

@ericcurtin : I was able to test rpm-ostree upgrade functionality on the Qualcomm board with this change. It requires aboot-deploy 0.7 to work. This change looks good to me and is safe to merge.

@cgwalters
Copy link
Member

Almost all the CI failures here are not related to the PR, they're fallout from Fedora CoreOS 42 switching to a container image for transport by default.

One of the failures is related to the bin/sbin merge.

I will look at fixing the tests.

@ericcurtin ericcurtin requested a review from Copilot April 30, 2025 12:17
Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR refines the bootloader configuration setup in the deployment process by ensuring that the "aboot" and "abootcfg" entries are only set when aboot.img is detected.

  • Removed the fallback branch that previously set "aboot" from the deployment_dirpath.
  • Updated the "abootcfg" path to use bootcsumdir consistently with "aboot".
Comments suppressed due to low confidence (1)

src/libostree/ostree-sysroot-deploy.c:2114

  • Add tests to verify that no bootloader configuration entries are set when aboot.img is absent, ensuring that the removal of the fallback behavior does not introduce regressions.
g_autofree char *aboot_relpath = g_strconcat ("/", bootcsumdir, "/", aboot_fn, NULL);

Comment on lines 2115 to +2119
g_autofree char *aboot_relpath = g_strconcat ("/", bootcsumdir, "/", aboot_fn, NULL);
ostree_bootconfig_parser_set (bootconfig, "aboot", aboot_relpath);
}
else
{
g_autofree char *aboot_relpath
= g_strconcat ("/", deployment_dirpath, "/usr/lib/ostree-boot/aboot.img", NULL);
ostree_bootconfig_parser_set (bootconfig, "aboot", aboot_relpath);
}

g_autofree char *abootcfg_relpath
= g_strconcat ("/", deployment_dirpath, "/usr/lib/ostree-boot/aboot.cfg", NULL);
ostree_bootconfig_parser_set (bootconfig, "abootcfg", abootcfg_relpath);
g_autofree char *abootcfg_relpath = g_strconcat ("/", bootcsumdir, "/aboot.cfg", NULL);
ostree_bootconfig_parser_set (bootconfig, "abootcfg", abootcfg_relpath);
Copy link
Preview

Copilot AI Apr 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The change consolidates the aboot and abootcfg paths to use bootcsumdir exclusively. Please update the function's documentation and related comments to clearly state that these boot entries will only be set when aboot.img is found.

Copilot uses AI. Check for mistakes.

@champtar
Copy link
Collaborator

@ricardosalveti you can rebase on top of @cgwalters fixes

Bootloader entry should only have aboot and abootcfg configuration
entries when aboot.img is found on the system (e.g.
/usr/lib/modules/$kver).

Otherwise it will be always set, won't be used during boot and
systemd-boot will complain about unknown lines.

Tested-by: Brian Masney <bmasney@redhat.com>
Signed-off-by: Ricardo Salveti <ricardo@foundries.io>
auto-merge was automatically disabled April 30, 2025 16:27

Head branch was pushed to by a user without write access

@ricardosalveti
Copy link
Contributor Author

@ricardosalveti you can rebase on top of @cgwalters fixes

done, thanks!

@ericcurtin ericcurtin enabled auto-merge April 30, 2025 16:34
@ericcurtin
Copy link
Collaborator

Fingers crossed auto-merge works

@ostreedev ostreedev deleted a comment from ericcurtin Apr 30, 2025
@ericcurtin ericcurtin merged commit fc7f18e into ostreedev:main Apr 30, 2025
26 checks passed
@alexlarsson
Copy link
Member

This change seems to have broken automotive updates for me. We used to look for aboot.cfg in /usr/lib/ostree-boot, but then change makes it look in bootcsumdir (i.e. /boot/ostree/$osname-$csum), where the file isn't.

This means that the BLS file will point to the wrong location for the aboot.cfg file and we never actually update the aboot partition during an A/B switch in the deploy.

alexlarsson added a commit to alexlarsson/ostree that referenced this pull request Aug 6, 2025
The change in ostreedev#3413 was meant
to change when the abootcfg option is set in the BLS file.  However,
it also changed the value of this key, using the wrong directory
(bootcsumdir instead of /usr/lib/ostree-boot).

This means that during update, aboot-update gets the wrong path to the
config and cannot correctly write the aboot partition.
alexlarsson added a commit to alexlarsson/ostree that referenced this pull request Aug 6, 2025
The change in ostreedev#3413 was meant
to change when the abootcfg option is set in the BLS file.  However,
it also changed the value of this key, using the wrong directory
(bootcsumdir instead of /usr/lib/ostree-boot).

This means that during update, aboot-update gets the wrong path to the
config and cannot correctly write the aboot partition.

Signed-off-by: Alexander Larsson <alexl@redhat.com>
@masneyb
Copy link

masneyb commented Aug 6, 2025

@alexlarsson : Can you elaborate specifically what's broken with automotive updates? I outlined what all I tested above in two different comments. This includes the rpm-ostree upgrade functionality. I'm mostly curious what I missed, and don't want to miss that again in the future.

I also listed aboot-deploy 0.7 to work as a requirement.

@alexlarsson
Copy link
Member

@masneyb Hmm, actually I'm not sure if you would have noticed it in your test.

Here is what happens when I use ukiboot:

  1. rpm-ostree upgrade seems to work and it stages the update
  2. However, during reboot when the finalize-staged part runs it calls out to aboot-deploy with the incorrect path to the aboot.cfg file
  3. aboot-deploy fails, because it needs the BOOT_TYPE=ukiboot in the config file to find the right partitions, thus we never swap partitions, and aboot-deploy fails.
  4. On next boot, we're still in slot a, and the logs say:
    ostree[583]: error: ostree-finalize-staged.service failed on previous boot: Final bootloader swap: Child process exited with code 5

However, looking at aboot-deploy it seems like for aboot, maybe things will work "correctly" (falling back to a single slot boot) with a non-existing config file. However, if any other options from the config file was necessary it would still fail.

@masneyb
Copy link

masneyb commented Aug 6, 2025

When I tested this in automotive back in Spring, we didn't have a utility bundled by default to do the slot switching, so this was a single slot deployment. I ensured that when I did the upgrade that the new package versions appeared on reboot. I did not go through the system log messages to look for error messages.

@alexlarsson
Copy link
Member

@masneyb I think I only got the error message because i needed the aboot.cfg file to tell it to look for ukiboot_a partition names instead of the default aboot_a name. So, it failed to find the partition for me, but not for you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants