Skip to content

Conversation

tksm
Copy link
Contributor

@tksm tksm commented Feb 2, 2023

Fixes #217

This PR improves the handling of container termination. As discussed in #217, stern currently does not handle container termination well. The steps in the comment can reproduce this issue.

This PR makes the following changes to improve. (sorry for the big change)

  • Manage the last shown Container IDs
    • Start tailing logs only when Container IDs are changed
    • It prevents the same log lines are unnecessarily shown
    • The logic to choose an ID is based on Kubelet's validateContainerLogStatus.
  • The container states filter is applied only to containers that have existed before stern starts
    • It allows stern to show a pod that immediately completes that skips a "running" state.
  • Do not cancel tailing when containers are terminated or removed
    • It prevents the last logs from being omitted by canceling.
    • Tailing will eventually stop by EOF or other errors
  • Retry tailing if it gets an error
    • Before this change, retrying would happen when pod events fire
    • After this change, we need explicit retrying because we ignore pod events when Container IDs are not changed

This change is significant and hard to debug, so I would like to add debug logs. It also helps us to understand container state transitions. Here is an example of debug logs for a restarted pod.

$ dist/stern --verbosity 7 restart 2> >(grep -E 'target.go|›')
I0202 21:21:21.051218   26997 target.go:139] "Container ID is empty" state="waiting" target="default-restart-busybox"
I0202 21:21:21.676584   26997 target.go:158] "Container ID was changed" state="terminated" target="default-restart-busybox" container="containerd://2a2a0574f0fdfbce8dddc647a1e3469e5b2ccd4e5fb8c95225e778c8f020ab04" last=""
+ restart › busybox
restart busybox 21567
- restart › busybox
I0202 21:21:22.679503   26997 target.go:158] "Container ID was changed" state="terminated" target="default-restart-busybox" container="containerd://fac6a6ad2e8029b97b9eb4a65a9168c2bea9fb063abe823749520952216a9bf0" last="containerd://2a2a0574f0fdfbce8dddc647a1e3469e5b2ccd4e5fb8c95225e778c8f020ab04"
+ restart › busybox
restart busybox 601
- restart › busybox
I0202 21:21:23.686007   26997 target.go:153] "Container ID is the same" state="waiting" target="default-restart-busybox" container="containerd://fac6a6ad2e8029b97b9eb4a65a9168c2bea9fb063abe823749520952216a9bf0"
I0202 21:21:38.713079   26997 target.go:158] "Container ID was changed" state="terminated" target="default-restart-busybox" container="containerd://d7e9e5fbc788cb67081b81697b9817ac2128a2d45e09c8e55c5095d2bf0f1128" last="containerd://fac6a6ad2e8029b97b9eb4a65a9168c2bea9fb063abe823749520952216a9bf0"
+ restart › busybox
restart busybox 19817
- restart › busybox

@tksm tksm force-pushed the improve-termination-handling branch from 41f3540 to 448a4c0 Compare February 2, 2023 12:22
@tksm
Copy link
Contributor Author

tksm commented Feb 3, 2023

📝 This change will also allow users to use stern to debug CrashLoopBackoff containers.

@superbrothers
Copy link
Member

Thanks. I will check this out this weekend!

@tksm
Copy link
Contributor Author

tksm commented Feb 7, 2023

I'll rebase and squash after the review.

@tksm tksm force-pushed the improve-termination-handling branch from 01dedca to a98c0b4 Compare February 7, 2023 01:53
@tksm
Copy link
Contributor Author

tksm commented Feb 7, 2023

Thank you for the review. I've rebased on master in a98c0b4.

@superbrothers superbrothers merged commit 8312782 into stern:master Feb 7, 2023
@tksm tksm deleted the improve-termination-handling branch February 7, 2023 02:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Stern shows the same logs multiple times when a container is terminated
2 participants