-
Notifications
You must be signed in to change notification settings - Fork 36
Description
stacker version
v1.0.0-rc4-8e267fc
Describe the bug
This issue was first described in #431 We made a valid fix there, but but it did not fix the issue here.
When using build_only: true
for as under-layers stacker can fail to setup a valid container. The fact that the original docker layer was a 'tar' layer is also likely related.
The following comment string in the beginning of lxcRootfsString in pkg/overlay/metadata.go here is not correct for all use cases:
// find any manifest to mount: we don't care if this is tar or
// squashfs, we just need to mount something. the code that generates
// the output needs to care about this, not this code.
//
// if there are no manifests (this came from a tar layer or whatever),
// that's fine too; we just end up with two workaround directories as
// below
lxcRootfsString will ovl.Manifests dictionary and pick the first manifest it finds. In the case where stacker is only building squashfs a stacker file like below will fail if the dictionary traversal does not select 'squash+true' first.
minbase:
build_only: true
from:
type: docker
url: docker://busybox:latest
run: |
echo hello > /minbase.txt
rootfs:
from:
type: built
tag: minbase
run: |
[ -e /minbase.txt ]
The problem can be seen when reading the serialized overlay_metadata.json in roots/minbase/overlay_metadata.json the 'tar+false' entry is missing a layer (it has only 1, where the squashfs+true entry has 2). The file below is trimmed.
{
"Manifests": {
"squashfs+true": {
"schemaVersion": 2,
"config": {
"mediaType": "application/vnd.oci.image.config.v1+json",
"digest": "sha256:6f915f...3c821cd1688dc",
"size": 576
},
"layers": [
{
"mediaType": "application/vnd.stacker.image.layer.squashfs+zstd+verity",
"digest": "sha256:243c9d7...f482880",
"size": 2301952,
}
},
{
"mediaType": "application/vnd.stacker.image.layer.squashfs+zstd+verity",
"digest": "sha256:ad18d87c6...1a58280252",
"size": 8192,
}
]
},
"tar+false": {
"schemaVersion": 2,
"config": {
"mediaType": "application/vnd.oci.image.config.v1+json",
"digest": "sha256:3488e6e2e...0edb4b6cc7",
"size": 575
},
"layers": [
{
"mediaType": "application/vnd.oci.image.layer.v1.tar+gzip",
"digest": "sha256:1487bff95...bc5621",
"size": 2592227
}
]
}
},
...
To reproduce
The attached recreate.sh will reproduce the bug.
It reads the following environment variables:
- RUNS: default=50 number of runs
- BUILD_ONLY: default=true - allowed values ('true', 'false')
- LAYER_TYPES: default=squashfs - allowed values ('squashfs', 'tar', 'squashfs,tar', 'tar,squashfs')
Changing the value of BUILD_ONLY to 'false' or LAYER_TYPES to 'squashfs,tar' (or 'tar,squashfs') will cause the issue to not reproduce.
The problem only occurs with stacker files that have 'build_only: true' and are built '--layer-type=squashfs'.
Additional context
My bootkit project builds artifacts using stacker. It organizes these artifacts into a few layers that are to be published. It heavily uses 'build_only: true' and uses 'stacker publish' to publish the layers.
Due the this bug bootkit c-i build sees transient failures.
My options to avoid the bug are:
- build both tar and squashfs layers, but only publish the squashfs layers (
stacker publish --layer-type=squashfs
). - remove 'build_only: true' and only publish specific layers (
stacker publish --layer=x --layer=y...
)
Both of these options will incur a lot of extra cpu and io and the second one requires maintaining a list of what to publish in some place other than stacker.yaml