Skip to content

Response payload is not completed in kubernetes worker logger #16210

@pwl

Description

@pwl

Bug summary

Hi 👋
My kubernetes jobs crash with

Error during task execution: Response payload is not completed: <TransferEncodingError: 400, message='Not enough data for satisfy transfer length header.'>
NoneType: None

This happens exactly 600s after the last log is posted by the job. The job logs very infrequently by design, it just runs a CLI tool and we have to wait for the outputs without any logs being generated.

Here's a detailed stacktrace

12:09:15 PM
prefect.flow_runs.worker

Error occurred while streaming logs - Job will continue to run but logs will no longer be streamed to stdout.
Traceback (most recent call last):
  File "/usr/local/lib/python3.12/site-packages/aiohttp/client_proto.py", line 92, in connection_lost
    uncompleted = self._parser.feed_eof()
                  ^^^^^^^^^^^^^^^^^^^^^^^
  File "aiohttp/_http_parser.pyx", line 508, in aiohttp._http_parser.HttpParser.feed_eof
aiohttp.http_exceptions.TransferEncodingError: 400, message:
  Not enough data for satisfy transfer length header.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/local/lib/python3.12/site-packages/prefect_kubernetes/worker.py", line 881, in _stream_job_logs
    async for line in logs.content:
  File "/usr/local/lib/python3.12/site-packages/aiohttp/streams.py", line 52, in __anext__
    rv = await self.read_func()
         ^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/aiohttp/streams.py", line 349, in readline
    return await self.readuntil()
           ^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/aiohttp/streams.py", line 383, in readuntil
    await self._wait("readuntil")
  File "/usr/local/lib/python3.12/site-packages/aiohttp/streams.py", line 344, in _wait
    await waiter
aiohttp.client_exceptions.ClientPayloadError: Response payload is not completed: <TransferEncodingError: 400, message='Not enough data for satisfy transfer length header.'>

Version info

Version:             3.1.5
API version:         0.8.4
Python version:      3.12.7
Git commit:          3c06654e
Built:               Mon, Dec 2, 2024 6:57 PM
OS/Arch:             linux/x86_64
Profile:             ephemeral
Server type:         server
Pydantic version:    2.10.2
Integrations:
  prefect-kubernetes: 0.5.3

Additional context

I thought this has something to do with kubernetes_asyncio or asyncio default timeouts so I also tried running the same job with a worker with the most recent version of kubernetes_asyncio (via a custom docker image)

kubernetes_asyncio @ git+https://github.com/tomplus/kubernetes_asyncio.git@5e5de93c08d5a07e33dd66ec8d9aae741582a301

It must be caused by some default timeout of 600 seconds somewhere, but I can't figure out where and if it's possible to change this behavior. I can't tell where _request_timeout ends up being passed and if it's correctly interpreted.

Metadata

Metadata

Assignees

Labels

bugSomething isn't workingintegrationsRelated to integrations with other services

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions