Skip to content

Conversation

thaJeztah
Copy link
Member

relates to:

The TestNatNetworkICC and TestFlakyPortMappedHairpinWindows (TestPortMappedHairpinWindows) tests were frequently failing on Windows with a context timeout;

=== FAIL: github.com/docker/docker/integration/networking TestNatNetworkICC/User_defined_nat_network (9.67s)
    nat_windows_test.go:62: assertion failed: error is not nil: Post "http://%2F%2F.%2Fpipe%2Fdocker_engine/v1.51/containers/4357bd24c9b77b955ee961530d1f552ce099b3dcbeb396db599971b2396d8b08/start": context deadline exceeded
    panic.go:636: assertion failed: error is not nil: Error response from daemon: error while removing network: network mynat has active endpoints (name:"ctr2" id:"dc8d597dafef")

=== FAIL: github.com/docker/docker/integration/networking TestNatNetworkICC (18.34s)

=== FAIL: github.com/docker/docker/integration/networking TestFlakyPortMappedHairpinWindows (13.02s)
    nat_windows_test.go:110: assertion failed: error is not nil: Post "http://%2F%2F.%2Fpipe%2Fdocker_engine/v1.51/containers/65207ae3d6953d85cd2123feac45af60b059842d570d4f897ea53c813cba3cb4/start": context deadline exceeded
    panic.go:636: assertion failed: error is not nil: Error response from daemon: error while removing network: network clientnet has active endpoints (name:"amazing_visvesvaraya" id:"18add58d415e")

These timeouts were set in c1ab6ed and 2df4391, and were shared between Linux and Windows; likely Windows is slower to start, so these timeouts to be expected.

Let's increase the context timeout to give it a bit more time.

- What I did

- How I did it

- How to verify it

- Human readable description for the release notes

- A picture of a cute animal (not mandatory but encouraged)

The TestNatNetworkICC and TestFlakyPortMappedHairpinWindows (TestPortMappedHairpinWindows)
tests were frequently failing on Windows with a context timeout;

    === FAIL: github.com/docker/docker/integration/networking TestNatNetworkICC/User_defined_nat_network (9.67s)
        nat_windows_test.go:62: assertion failed: error is not nil: Post "http://%2F%2F.%2Fpipe%2Fdocker_engine/v1.51/containers/4357bd24c9b77b955ee961530d1f552ce099b3dcbeb396db599971b2396d8b08/start": context deadline exceeded
        panic.go:636: assertion failed: error is not nil: Error response from daemon: error while removing network: network mynat has active endpoints (name:"ctr2" id:"dc8d597dafef")

    === FAIL: github.com/docker/docker/integration/networking TestNatNetworkICC (18.34s)

    === FAIL: github.com/docker/docker/integration/networking TestFlakyPortMappedHairpinWindows (13.02s)
        nat_windows_test.go:110: assertion failed: error is not nil: Post "http://%2F%2F.%2Fpipe%2Fdocker_engine/v1.51/containers/65207ae3d6953d85cd2123feac45af60b059842d570d4f897ea53c813cba3cb4/start": context deadline exceeded
        panic.go:636: assertion failed: error is not nil: Error response from daemon: error while removing network: network clientnet has active endpoints (name:"amazing_visvesvaraya" id:"18add58d415e")

These timeouts were set in c1ab6ed and
2df4391, and were shared between Linux
and Windows; likely Windows is slower to start, so these timeouts to be
expected.

Let's increase the context timeout to give it a bit more time.

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
@thaJeztah
Copy link
Member Author

Ugh ... hitting this one again;

=== FAIL: client TestIntegration (396.44s)
    run.go:324: copied mcr.microsoft.com/windows/nanoserver:ltsc2022 to local mirror localhost:62088/library/nanoserver:latest
time="2025-07-08T12:34:19Z" level=info msg="fetch failed after status: 404 Not Found" host="localhost:62088"
    run.go:324: copied docker.io/wintools/nanoserver:ltsc2022 to local mirror localhost:62088/library/nanoserver:plus
    run.go:253: 
        	Error Trace:	D:/a/moby/moby/buildkit/util/testutil/integration/run.go:253
        	            				D:/a/moby/moby/buildkit/util/testutil/integration/run.go:254
        	            				D:/a/moby/moby/buildkit/client/client_test.go:256
        	            				D:/a/moby/moby/buildkit/client/client_test.go:242
        	Error:      	Should be true
        	Test:       	TestIntegration

DONE 22 tests, 16 skipped, 2 failures in 396.501s
Error: Process completed with exit code 1.

And ... of course .. LOL

=== Failed
=== FAIL: amd64.integration-cli TestDockerCLIPluginsSuite/TestPluginUpgrade (6.00s)
    docker_cli_plugins_test.go:435: assertion failed: 
        Command:  /usr/local/cli-integration/docker plugin install --grant-all-permissions cpuguy83/docker-volume-driver-plugin-local:latest
        ExitCode: 1
        Error:    exit status 1
        Stdout:   latest: Pulling from cpuguy83/docker-volume-driver-plugin-local
        Digest: sha256:aac039baa37b77390b00ebab9205759ae239e860aaf4b75165c830aff9b92894
        
        Stderr:   failed to copy: httpReadSeeker: failed open: unexpected status from GET request to https://registry-1.docker.io/v2/cpuguy83/docker-volume-driver-plugin-local/manifests/sha256:aac039baa37b77390b00ebab9205759ae239e860aaf4b75165c830aff9b92894: 429 Too Many Requests
        toomanyrequests: You have reached your unauthenticated pull rate limit. https://www.docker.com/increase-rate-limit
        
        
        Failures:
        ExitCode was 1 expected 0
        Expected no error
    --- FAIL: TestDockerCLIPluginsSuite/TestPluginUpgrade (6.00s)

Copy link
Contributor

@robmry robmry left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM - thank you ... I'm not sure why the cleanup fails (after the timeout) in TestFlakyPortMappedHairpinWindows, the network is deleted eventually. But, it's a separate issue.

@thaJeztah
Copy link
Member Author

I'm not sure why the cleanup fails (after the timeout) in TestFlakyPortMappedHairpinWindows, the network is deleted eventually. But, it's a separate issue.

Wondering as well; possibly it could be a race because of this? (container eventually starts, but not (yet) removed or something silly. We should probably still dig in, but hopefully this reduces some of the issues.

@robmry
Copy link
Contributor

robmry commented Jul 8, 2025

Wondering as well; possibly it could be a race because of this? (container eventually starts, but not (yet) removed or something silly. We should probably still dig in, but hopefully this reduces some of the issues.

Yes, it must be something like that ... by the time the test tear-down runs the container must have exited, so its extra network-delete call succeeds.

@thaJeztah thaJeztah merged commit 836bd72 into moby:master Jul 8, 2025
269 of 272 checks passed
@thaJeztah thaJeztah deleted the windows_networking_deflake branch July 8, 2025 14:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants