Skip to content

[deadlock] scheduler: if querier OOM restart . #7722

@liguozhong

Description

@liguozhong

Describe the bug
I deploy frontend+scheduler+querier 3 components in k8s to complete logql API query.

According to my observation, when the queryer is killed by k8s due to OOM, the logql query will be unavailable. And the log keeps reporting errors.

frontend log

level=error ts=2022-11-18T10:52:00.862957605Z caller=retry.go:77 org_id=123_qaawmopdln 
msg=\"error processing request\" try=0 err=\"context canceled\"

The phenomenon is somewhat similar to deadlock. But I'm still investigating the rootcause.

To Reproduce
I deploy loki without helm, it's diffcult to reproduce .

Expected behavior
if querier was killed, logql should be available.

Environment:

  • Infrastructure: [Kubernetes]
  • Deployment tool: [deploy yaml one by one]
  • loki code branch : github master

Screenshots, Promtail config, or terminal output
scheduler log
image

image

prometheus metrics
image

querier cpu and memory
image

frontend querylog
image

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions