Skip to content

Bug: cache use problem with build_only layers single --layer-type #442

@smoser

Description

@smoser

stacker version

v1.0.0-rc4-8e267fc

Describe the bug

This issue was first described in #431 We made a valid fix there, but but it did not fix the issue here.

When using build_only: true for as under-layers stacker can fail to setup a valid container. The fact that the original docker layer was a 'tar' layer is also likely related.

The following comment string in the beginning of lxcRootfsString in pkg/overlay/metadata.go here is not correct for all use cases:

// find any manifest to mount: we don't care if this is tar or
// squashfs, we just need to mount something. the code that generates
// the output needs to care about this, not this code.
//
// if there are no manifests (this came from a tar layer or whatever),
// that's fine too; we just end up with two workaround directories as
// below

lxcRootfsString will ovl.Manifests dictionary and pick the first manifest it finds. In the case where stacker is only building squashfs a stacker file like below will fail if the dictionary traversal does not select 'squash+true' first.

minbase:
  build_only: true
  from:
    type: docker
    url: docker://busybox:latest
  run: |
    echo hello > /minbase.txt

rootfs:
  from:
    type: built
    tag: minbase
  run: |
    [ -e /minbase.txt ]

The problem can be seen when reading the serialized overlay_metadata.json in roots/minbase/overlay_metadata.json the 'tar+false' entry is missing a layer (it has only 1, where the squashfs+true entry has 2). The file below is trimmed.

{
    "Manifests": {
        "squashfs+true": {
            "schemaVersion": 2,  
            "config": {
                "mediaType": "application/vnd.oci.image.config.v1+json",
                "digest": "sha256:6f915f...3c821cd1688dc",
                "size": 576
            },
            "layers": [
                {
                    "mediaType": "application/vnd.stacker.image.layer.squashfs+zstd+verity",
                    "digest": "sha256:243c9d7...f482880",                
                    "size": 2301952,
                    }
                },  
                {
                    "mediaType": "application/vnd.stacker.image.layer.squashfs+zstd+verity",
                    "digest": "sha256:ad18d87c6...1a58280252",                
                    "size": 8192,
                }
            ]
        },
        "tar+false": {
            "schemaVersion": 2,
            "config": {
                "mediaType": "application/vnd.oci.image.config.v1+json",
                "digest": "sha256:3488e6e2e...0edb4b6cc7",
                "size": 575
            },
            "layers": [
                {   
                    "mediaType": "application/vnd.oci.image.layer.v1.tar+gzip",
                    "digest": "sha256:1487bff95...bc5621",                
                    "size": 2592227
                }
            ]
        }
    },
...

To reproduce

The attached recreate.sh will reproduce the bug.

It reads the following environment variables:

  • RUNS: default=50 number of runs
  • BUILD_ONLY: default=true - allowed values ('true', 'false')
  • LAYER_TYPES: default=squashfs - allowed values ('squashfs', 'tar', 'squashfs,tar', 'tar,squashfs')

Changing the value of BUILD_ONLY to 'false' or LAYER_TYPES to 'squashfs,tar' (or 'tar,squashfs') will cause the issue to not reproduce.

The problem only occurs with stacker files that have 'build_only: true' and are built '--layer-type=squashfs'.

Additional context

My bootkit project builds artifacts using stacker. It organizes these artifacts into a few layers that are to be published. It heavily uses 'build_only: true' and uses 'stacker publish' to publish the layers.

Due the this bug bootkit c-i build sees transient failures.

My options to avoid the bug are:

  • build both tar and squashfs layers, but only publish the squashfs layers (stacker publish --layer-type=squashfs).
  • remove 'build_only: true' and only publish specific layers (stacker publish --layer=x --layer=y...)

Both of these options will incur a lot of extra cpu and io and the second one requires maintaining a list of what to publish in some place other than stacker.yaml

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions