Skip to content

Conversation

charludo
Copy link
Contributor

We have been experiencing sporadic corruption of the layer cache file. Applying genpolicy --use-cached-files to this deployment.yaml and initial state of the layer cache layers-cache.json:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: whatever
spec:
  template:
    spec:
      runtimeClassName: contrast-cc
      containers:
        - name: secret-service
          image: 'ghcr.io/edgelesssys/privatemode/secret-service:v1.17.0@sha256:e36c9f13932f46e2fd460c5469dd4ee22bcdbaefd4b576d985f9e9a435413d3b'
[
  {
    "diff_id": "sha256:fa6a52ec54152fea01d1e53fa5b2447fcc2004d56414f23830fe4536b8b3f823",
    "verity_hash": "e25c3d85b5ef4fb029b338b0eef3c4d5f29ffabb7d0aa27a2a8130e06fdd8190",
    "passwd": "root:x:0:0:root:/root:/bin/ash\nbin:x:1:1:bin:/bin:/sbin/nologin\ndaemon:x:2:2:daemon:/sbin:/sbin/nologin\nadm:x:3:4:adm:/var/adm:/sbin/nologin\nlp:x:4:7:lp:/var/spool/lpd:/sbin/nologin\nsync:x:5:0:sync:/sbin:/bin/sync\nshutdown:x:6:0:shutdown:/sbin:/sbin/shutdown\nhalt:x:7:0:halt:/sbin:/sbin/halt\nmail:x:8:12:mail:/var/mail:/sbin/nologin\nnews:x:9:13:news:/usr/lib/news:/sbin/nologin\nuucp:x:10:14:uucp:/var/spool/uucppublic:/sbin/nologin\noperator:x:11:0:operator:/root:/sbin/nologin\nman:x:13:15:man:/usr/man:/sbin/nologin\npostmaster:x:14:12:postmaster:/var/mail:/sbin/nologin\ncron:x:16:16:cron:/var/spool/cron:/sbin/nologin\nftp:x:21:21::/var/lib/ftp:/sbin/nologin\nsshd:x:22:22:sshd:/dev/null:/sbin/nologin\nnobody:x:65534:65534:nobody:/:/sbin/nologin\nnonroot:x:65532:65532:Account created by apko:/home/nonroot:/bin/sh\nnotebook-user:x:1000:1000::/home/notebook-user:/bin/ash\n",
    "group": "root:x:0:root\nbin:x:1:root,bin,daemon\ndaemon:x:2:root,bin,daemon\nsys:x:3:root,bin,adm\nadm:x:4:root,adm,daemon\ntty:x:5:\ndisk:x:6:root,adm\nlp:x:7:lp\nmem:x:8:\nkmem:x:9:\nwheel:x:10:root\nfloppy:x:11:root\nmail:x:12:mail\nnews:x:13:news\nuucp:x:14:uucp\nman:x:15:man\ncron:x:16:cron\nconsole:x:17:\naudio:x:18:\ncdrom:x:19:\ndialout:x:20:root\nftp:x:21:\nsshd:x:22:\ninput:x:23:\nat:x:25:at\ntape:x:26:root\nvideo:x:27:root\nnetdev:x:28:\nreadproc:x:30:\nkvm:x:34:kvm\ngames:x:35:\nshadow:x:42:\ncdrw:x:80:\nusb:x:85:\nntp:x:123:\nnofiles:x:200:\nsmmsp:x:209:smmsp\nlocate:x:245:\nutmp:x:406:\nnogroup:x:65533:\nnobody:x:65534:\nnonroot:x:65532:\nnotebook-user:x:1000:notebook-user\n"
}
]

reproduces the error by generating this corrupted layers-cache.json in version 3.16.0 of kata-containers:

[
  {
    "diff_id": "sha256:8f061a335e1d6c822399a78168ba847edc75ec8703a064418b21cbb2a8fbf867",
    "verity_hash": "18a84da98e284198dea7b85dd48d5622cd276bdcf4471cf5d4e37f582bfb2c17",
    "passwd": ""
  }
]
:0:root:/root:/bin/ash\nbin:x:1:1:bin:/bin:/sbin/nologin\ndaemon:x:2:2:daemon:/sbin:/sbin/nologin\nadm:x:3:4:adm:/var/adm:/sbin/nologin\nlp:x:4:7:lp:/var/spool/lpd:/sbin/nologin\nsync:x:5:0:sync:/sbin:/bin/sync\nshutdown:x:6:0:shutdown:/sbin:/sbin/shutdown\nhalt:x:7:0:halt:/sbin:/sbin/halt\nmail:x:8:12:mail:/var/mail:/sbin/nologin\nnews:x:9:13:news:/usr/lib/news:/sbin/nologin\nuucp:x:10:14:uucp:/var/spool/uucppublic:/sbin/nologin\noperator:x:11:0:operator:/root:/sbin/nologin\nman:x:13:15:man:/usr/man:/sbin/nologin\npostmaster:x:14:12:postmaster:/var/mail:/sbin/nologin\ncron:x:16:16:cron:/var/spool/cron:/sbin/nologin\nftp:x:21:21::/var/lib/ftp:/sbin/nologin\nsshd:x:22:22:sshd:/dev/null:/sbin/nologin\nnobody:x:65534:65534:nobody:/:/sbin/nologin\nnonroot:x:65532:65532:Account created by apko:/home/nonroot:/bin/sh\nnotebook-user:x:1000:1000::/home/notebook-user:/bin/ash\n"
  },
  {
    "diff_id": "sha256:e67c36428ee0de777e8ec5f7d73483f58854bd361026979b6937b631e2725f7a",
    "verity_hash": "e1fa97319b42f0f53fc07cbc2308e7a3d790aab2d35a2aa3820ec00b706e5efb",
    "passwd": ""
  }
]
loppy:x:11:root\nmail:x:12:mail\nnews:x:13:news\nuucp:x:14:uucp\nman:x:15:man\ncron:x:16:cron\nconsole:x:17:\naudio:x:18:\ncdrom:x:19:\ndialout:x:20:root\nftp:x:21:\nsshd:x:22:\ninput:x:23:\nat:x:25:at\ntape:x:26:root\nvideo:x:27:root\nnetdev:x:28:\nreadproc:x:30:\nkvm:x:34:kvm\ngames:x:35:\nshadow:x:42:\ncdrw:x:80:\nusb:x:85:\nntp:x:123:\nnofiles:x:200:\nsmmsp:x:209:smmsp\nlocate:x:245:\nutmp:x:406:\nnogroup:x:65533:\nnobody:x:65534:\nnonroot:x:65532:\nnotebook-user:x:1000:notebook-user\n"
}

Following 349ce8c this example no longer exhibits the behavior. In the linked commit, the user and group parsing was refactored and improved.

However, for very long strings (in otherwise valid cache files), the same behavior can still be reproduced on the tip of main.

An (ugly) script to generate such a file:

USER_COUNT=7000  # 7k worked for me - depending on your machine, fewer might suffice or more ight be necessary
TMP_USERS=""

for i in $(seq 1 "$USER_COUNT"); do
    username="user-$i"
    uid=$((1000 + i))
    gid=$((1000 + i))

    passwd_entry="$username:x:$uid:$gid::/home/$username:/bin/ash\\n"
    group_entry="$username:x:$gid:$username\\n"

    user_json=$(jq -n \
        --arg user "$username" \
        --arg passwd "$passwd_entry" \
        --arg group "$group_entry" \
        '{username: $user, passwd: $passwd, group: $group}')

    TMP_USERS="${TMP_USERS}${user_json},"
done

final_json="[${TMP_USERS%,}]"
echo "$final_json" > test_users.json

The changes in this PR fix the issue. The root cause is that fs2 only obtains a file lock during write. Since it's an exclusive lock, it should prevent concurrent writes; however, at other points in the code, no lock is acquired to read (or delete) the file. Since fs2 implements advisory locks (through flock on UNIX systems), different threads/processes are still able to read/delete while the lock is supposed to be held elsewhere.

A large number of users and groups seems to take sufficient time to parse to reproduce this error reliably.

To be honest, I am not entirely certain how exactly the corruption of the file occurs, since only one concurrent write should be possible in the current implementation. Maybe someone here has deeper insight into this.

@katexochen
Copy link
Contributor

cc @danmihai1

Copy link
Member

@danmihai1 danmihai1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@charludo , thanks for pointing out these issues!

To be honest, I am not entirely certain how exactly the corruption of the file occurs, since only one concurrent write should be possible in the current implementation. Maybe someone here has deeper insight into this.

My understanding is that our OpenOptions::new().write(true).open + writer.flush were not truncating the output file, when the json output was smaller than the file previously grown by a concurrently running genpolicy instance.

Given that:

  • Synchronizing multiple readers and writes is tricky as we discussed above, dependent on the behavior of buffering at various levels, etc.
  • Reading and writing this file over and over again is relatively expensive.
  • Executing multiple copies of genpolicy concurrently is mostly a test use case.

I think we should simplify the design:

  1. try_lock_shared() at the beginning of genpolicy execution. If locking failed: disable layer caching for the current genpolicy instance and cancel all the steps below.
  2. Read the file a single time in memory. If reading fails, start with an empty set of cached layers.
  3. unlock().
  4. Add any layers into the memory of the current genpolicy instance - but don't write these changes into the file yet.
  5. Before successful exit: if there were any changes to the in-memory cached layers:
    a. try_lock_exclusive(). If locking failed: discard any layer changes from current instance's memory, and cancel the steps below.
    b. Truncate file to size 0.
    c. Flush the cached contents from current instance's memory into the file.
    d. unlock().

Thoughts?

I would rather avoid the complexity of NamedTempFile - hopefully the simplified algorithm I mentioned above is good enough.

You are more than welcome to make these changes if you want to. Otherwise, either I or one of my colleagues will make the changes.

@danmihai1 danmihai1 requested a review from Redent0r June 17, 2025 20:54
@burgerdev burgerdev self-requested a review June 18, 2025 06:13
@charludo
Copy link
Contributor Author

charludo commented Jun 18, 2025

My understanding is that our OpenOptions::new().write(true).open + writer.flush were not truncating the output file, when the json output was smaller than the file previously grown by a concurrently running genpolicy instance.

That does make a lot of sense!

Edit: took a shot at implementing the suggestions.

@charludo charludo force-pushed the fix/genpolicy-corruption-of-layer-cache-file branch from c2a3fa0 to 432ff1d Compare June 18, 2025 13:41
Copy link
Member

@danmihai1 danmihai1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great to me - thanks @charludo! I will start the CI tests.

@danmihai1
Copy link
Member

@charludo when you'll get a chance, please try to address the test failures marked "Required" in the list below. Let me know in case you'll need help with these failures.

@charludo charludo force-pushed the fix/genpolicy-corruption-of-layer-cache-file branch from 432ff1d to fb41824 Compare June 20, 2025 06:23
The locking mechanism around the layers cache file was insufficient to
prevent corruption of the file. This commit moves the layers cache's
management in-memory, only reading the cache file once at the beginning
of `genpolicy`, and only writing to it once, at the end of `genpolicy`.

In the case that obtaining a lock on the cache file fails,
reading/writing to it is skipped, and the cache is not used/persisted.

Signed-off-by: charludo <git@charlotteharludo.com>
@charludo charludo force-pushed the fix/genpolicy-corruption-of-layer-cache-file branch from 6af26b1 to 4e57cc0 Compare June 23, 2025 14:25
Copy link
Contributor

@burgerdev burgerdev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, thanks!

@danmihai1 danmihai1 merged commit 0a57e09 into kata-containers:main Jun 23, 2025
503 of 529 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants