Skip to content

Conversation

avagin
Copy link
Contributor

@avagin avagin commented Mar 25, 2025

CRIU always restores processes into a time namespace to prevent backward jumps of monotonic and boottime clocks. This change updates the container config to ensure that runc exec launches new processes within the container's time namespace.

Fixes checkpoint-restore/criu#2610

@kolyshkin
Copy link
Contributor

CRIU always restores processes into a time namespace

Maybe makes sense to add "Since v3.14, CRIU always restores ..."

@kolyshkin kolyshkin added area/checkpoint-restore backport/1.3-todo A PR in main branch which needs to be backported to release-1.3 labels Mar 26, 2025
Since v3.14, CRIU always restores processes into a time namespace to
prevent backward jumps of monotonic and boottime clocks. This change
updates the container configuration to ensure that `runc exec` launches
new processes within the container's time namespace.

Fixes opencontainers#2610

Signed-off-by: Andrei Vagin <avagin@gmail.com>
Copy link
Contributor

@kolyshkin kolyshkin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@kolyshkin
Copy link
Contributor

@adrianreber @lifubang @rata PTAL

@kolyshkin kolyshkin added this to the 1.3.0-rc.2 milestone Mar 26, 2025
@lifubang
Copy link
Member

CRIU always restores processes into a time namespace to prevent backward jumps of monotonic and boottime clocks.

Do you know whether this is a long term solution or not in CRIU. It seems that criu set a wrong time offset config for the restored process.

diff --git a/tests/integration/checkpoint.bats b/tests/integration/checkpoint.bats
index 3db34061..85368a8a 100644
--- a/tests/integration/checkpoint.bats
+++ b/tests/integration/checkpoint.bats
@@ -474,5 +474,9 @@ function simple_cr() {
                runc exec test_busybox sh -c 'sleep 1000 < /dev/null &> /dev/null & echo $!'
                [ "$status" -eq 0 ]
                execed_pid=$output
+               runc exec test_busybox cat /proc/self/timens_offsets
+               [ "$status" -eq 0 ]
+               grep -E '^monotonic\s+0\s+0$' <<<"$output"
+               grep -E '^boottime\s+0\s+0$' <<<"$output"
        done
 }

Test error msg:

...
   runc exec test_busybox sh -c sleep 1000 < /dev/null &> /dev/null & echo $! (status=0):
   29
   runc exec test_busybox cat /proc/self/timens_offsets (status=0):
   monotonic          -1 955929992
   boottime           -1 955925201
   --- teardown ---

12 tests, 1 failure, 2 skipped

I don't know whether this is the expected result or not.

@avagin
Copy link
Contributor Author

avagin commented Mar 27, 2025

CRIU always restores processes into a time namespace to prevent backward jumps of monotonic and boottime clocks.

Do you know whether this is a long term solution or not in CRIU.

It is a long-term solution. The time namespace was designed for the C/R purpose, and there is no way to restore processes without using time namespaces.

It seems that criu set a wrong time offset config for the restored process.

CRIU sets offsets so that the monotonic and boottime clocks continue ticking from the moment when a container was dumped. The monotonic and boottime clocks cannot jump backward. When a container is restored on another machine or on the same machine after a reboot, the "native" clocks can have any values, and we need to adjust them according to their values when the processes were dumped.

Copy link
Member

@rata rata left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This LGTM but I'd like for @lifubang to nail down the issue he sees.

Copy link
Member

@rata rata left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR! Left some comments

fi

# exec a new background process.
runc exec test_busybox sh -c 'sleep 1000 < /dev/null &> /dev/null & echo $!'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we have the dev/null redirections here? I guess there is a criu thing that is better with that? Can you please elaborate?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as in simple_cr from where the code is copied. The issue here is we need sleep process to be checkpointable, and if some of its file descriptors point to files/pipes on the host, it is not checkpointable (as the process is not contained in a container).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I don't see this null redirections in checkpoints.bat. Also, runc does a reopen of /dev/null, so I guess that shouldn't be the issue?

But yeah, removing those redirections locally makes the test get stalled. I guess this is fine if criu devs think it is needed. Although some explanation on why that is the case would be great, IMHO.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sleep process is daemonized, and runsc exec exits right after forking it. If we don't redirect its file descriptors, they will be closed when runsc exec exits, and the sleep process can be killed by SIGHUP.

Copy link
Member

@rata rata left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Although some more explanations in the tests would be great, as we might need to debug them later.

@lifubang PTAL to see if your concern is addressed

@lifubang
Copy link
Member

lifubang commented Apr 1, 2025

PTAL to see if your concern is addressed

I think it's reasonable for CRIU to set the restored processes's time offset.
But after the container restored, maybe the new exec processes to this container shouldn't join the init's process's time ns? Because I think these new processes didn't need to know whether this container has been check pointed and restored for sometimes or not. But maybe I'm wrong.

@rata
Copy link
Member

rata commented Apr 1, 2025

@lifubang what do you mean exactly? That the container process and the exec'ed process should not use the same time namespace?

@lifubang
Copy link
Member

lifubang commented Apr 1, 2025

@lifubang what do you mean exactly? That the container process and the exec'ed process should not use the same time namespace?

Yes, I think because the original config.json may not contain time namespace when creating a container. So the new exec'ed process after restored should not have a different time offset too.

@rata
Copy link
Member

rata commented Apr 1, 2025

@lifubang but having a different view in exec than the container seems very confusing. And debugging anything might be very hard (if it's timens related)

@lifubang
Copy link
Member

lifubang commented Apr 1, 2025

@lifubang but having a different view in exec than the container seems very confusing. And debugging anything might be very hard (if it's timens related)

Yes, so I have no strong opinion to change. Feel free to merge this PR.

@rata
Copy link
Member

rata commented Apr 1, 2025

@lifubang merging then. We can revert or fix if you find any weird issue, as you usually do. I hope not in this case, though :)

@rata rata merged commit c3a41d7 into opencontainers:main Apr 1, 2025
34 checks passed
@kolyshkin kolyshkin removed this from the 1.3.0-rc.2 milestone Apr 2, 2025
@kolyshkin kolyshkin removed the backport/1.3-todo A PR in main branch which needs to be backported to release-1.3 label Apr 2, 2025
@kolyshkin kolyshkin added the backport/1.3-done A PR in main branch which has been backported to release-1.3 label Apr 2, 2025
@kolyshkin
Copy link
Contributor

1.3 backport: #4705

@rata
Copy link
Member

rata commented Apr 7, 2025

Looking at the upstream issue, it was reported with 1.2. So we should backport it there too.

@kolyshkin
Copy link
Contributor

Looking at the upstream issue, it was reported with 1.2. So we should backport it there too.

I'm not sure if we'll ever do 1.2.7 but why not.

1.2 backport: #4714

@kolyshkin kolyshkin added the backport/1.2-done A PR in main branch which has been backported to release-1.2 label Apr 8, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/checkpoint-restore backport/1.2-done A PR in main branch which has been backported to release-1.2 backport/1.3-done A PR in main branch which has been backported to release-1.3
Projects
None yet
Development

Successfully merging this pull request may close these issues.

with runc, PID 1 gets put in the wrong time NS on restore
4 participants