Skip to content

Conversation

felipecrs
Copy link
Contributor

@felipecrs felipecrs commented Aug 25, 2023

Example:

stern . --condition=ready=false --tail=0

TODOs:

  • Only starts tailing logs for pods matching --condition
  • Stop tailing logs for pods that no longer match --condition
  • Add tests

Fixes #213

@felipecrs
Copy link
Contributor Author

felipecrs commented Aug 25, 2023

I'm giving #213 a go because I really need it. But I'm struggling with the second TODO. Is there a chance someone can help me?

PS: code is not yet ready for final review, but it's working. Also, I have almost zero experience with Golang.

@felipecrs felipecrs changed the title Add --condition condition-name[=condition-value] Add support for --condition Aug 25, 2023
@superbrothers
Copy link
Member

I will look into it this weekend.

@felipecrs
Copy link
Contributor Author

Thanks a lot @superbrothers!

@superbrothers
Copy link
Member

It is on the way, but I have a working patch. I won't have much time this weekend but will be able to work on it after that. 816c896

You can try it out as soon as you build it.

@felipecrs
Copy link
Contributor Author

felipecrs commented Sep 8, 2023

@superbrothers, thanks a lot, you are the best!

However, it's not working quite as I expected. Let me help build a testing scenario too:

kubectl apply -f - <<'EOF'
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: test
spec:
  replicas: 1
  selector:
    matchLabels:
      app: test
  serviceName: test
  template:
    metadata:
      labels:
        app: test
    spec:
      terminationGracePeriodSeconds: 1
      containers:
      - name: test
        image: bash
        command:
          - bash
          - -c
          - |
            count=0
            while true; do
              echo "Hello World: $count"
              count=$((count+1))
              if [[ $count -eq 10 ]]; then
                touch /tmp/healthy
              fi
              sleep 1
            done
        readinessProbe:
          exec:
            command:
            - cat
            - /tmp/healthy
          initialDelaySeconds: 10
          periodSeconds: 1
EOF

This pod becomes ready in about 11 seconds.

Start the stern process with:

./dist/stern '.*' --condition ready=false --tail 0

Stern correctly filters out the pod if the pod is already ready, but Stern does not stop tailing the logs when the pod becomes ready:

WindowsTerminal_lMhjxZZGe5.mp4

@felipecrs
Copy link
Contributor Author

Ok, I think I know what's going on: K8s does not seem to emit an event for when the pod becomes ready. I'm investigating what can be done about it.

@felipecrs
Copy link
Contributor Author

felipecrs commented Sep 8, 2023

Checking kubectl wait's source code, it looks like they invoke an additional watcher for each resource until the condition is met:

https://github.com/kubernetes/kubectl/blob/0266cec8bce880387a33c4d948673749defeb0bc/pkg/cmd/wait/wait.go#L530

I wonder if and how we could do something similar:

If the pod initially matches condition and gets added, we invoke this additional watcher for such pod until the opposite condition is met. When the opposite condition is met, we add the pod to deleted.

(I already tried to tinker with it myself without much success)

@tksm
Copy link
Contributor

tksm commented Sep 9, 2023

Hi,

It appears that Stern has detected the events, but they could be filtered out by targetFilter.shouldAdd().

The following patch may resolve the issue, but I've noticed the behavior of --condition ready=false can be somewhat confusing. When we delete pods, the pod's Ready status reverts to False, causing the logs to restart from the beginning. Additionally, the Ready status can change due to other reasons.

diff --git a/stern/stern.go b/stern/stern.go
index c492519..dcece50 100644
--- a/stern/stern.go
+++ b/stern/stern.go
@@ -222,7 +222,7 @@ func Run(ctx context.Context, config *Config) error {
 					if !ok {
 						continue
 					}
-					cancel.(func())()
+					cancel.(context.CancelFunc)()
 				case <-nctx.Done():
 					return nil
 				}
diff --git a/stern/target.go b/stern/target.go
index 553e8c2..cbc6788 100644
--- a/stern/target.go
+++ b/stern/target.go
@@ -136,8 +136,14 @@ OUTER:
 			Container: c.Name,
 		}
 
+		if !conditionFound {
+			visitor(t, false)
+			f.forget(string(pod.UID))
+			continue
+		}
+
 		if f.shouldAdd(t, string(pod.UID), c) {
-			visitor(t, conditionFound)
+			visitor(t, true)
 		}
 	}
 }

The demo below illustrates that the logs restart when the pod is deleted (at 00:18).

asciicast

@felipecrs
Copy link
Contributor Author

Wow, it looks to be working indeed as per your demo. Can't wait to try your patch out.

BTW the behavior you described is indeed misleading.

That would not happen though when using --tail=0, which is my use case (only capture live logs, not past logs).

I will think about what can be done to circumvent this.

Maybe Stern has to remember up until which point it tailed logs for a given pod, and when it starts tailing it again, it should tail from that point on.

@tksm
Copy link
Contributor

tksm commented Sep 9, 2023

@felipecrs Oh, sorry. I missed the --tail=0 option. Thank you for pointing it out.

As you mentioned, Stern behaves as expected when using --tail=0.

I believe it would be better to automatically set --tail=0 when the --condition option is specified in order to avoid confusion.

asciicast

@felipecrs
Copy link
Contributor Author

@tksm really amazing, indeed it works like a charm. I added both @superbrothers' commit and your patch to this PR now.

Something I believe I need to do before merging is to fix the condition syntax for the config file. I left it as a string there, which makes it exactly as the CLI, but perhaps I should convert it to an object like:

- condition: ready=false
+ condition:
+   name: ready
+   value: false

What do you think? Personally, I don't mind. Even the first one looks good to me.

Another thing to think about is the concern raised by @tksm above when not using --tail=0. I proposed one solution above but I'm not sure if it's worth implementing it. Another option is to simply have this caveat acknowledged somewhere, maybe in the README.

Co-authored-by: Takashi Kusumi <takashi.kusumi@gmail.com>
@tksm
Copy link
Contributor

tksm commented Sep 10, 2023

I'm glad to hear it worked as expected. 🎉

Config file

I believe that the first one (condition: ready=false) is better because it maintains consistency with other options. Additionally, we do not need to implement anything in that case.

--tail=0 option

For the initial implementation, I think that the caveat in README.md and the error message when --tail=0 is not set are sufficient. Maybe we should avoid automatically setting --tail=0 for future compatibility.

# Raise an error when `--tail=0` is not set
$ ./dist/stern '.*' --condition ready=false
Error: Currently, --condition and --no-follow=false only work with --tail=0
Usage:
  stern pod-query [flags]
diff --git a/cmd/cmd.go b/cmd/cmd.go
index 35e3060..2c70027 100644
--- a/cmd/cmd.go
+++ b/cmd/cmd.go
@@ -130,6 +130,9 @@ func (o *options) Validate() error {
 	if o.selector != "" && o.resource != "" {
 		return errors.New("--selector and the <resource>/<name> query can not be set at the same time")
 	}
+	if o.condition != "" && !o.noFollow && o.tail != 0 {
+		return errors.New("Currently, --condition and --no-follow=false only work with --tail=0")
+	}
 
 	return nil
 }

@felipecrs
Copy link
Contributor Author

I'm sorry @tksm, but why enforce --tail=0 to --no-follow=false? Maybe you meant --no-follow=true, where it makes more sense: if --no-follow=true, no live logs would be shown, only past logs. Thus, filtering only live logs with --tail=0 makes no sense.

I changed it in the commit.

@felipecrs
Copy link
Contributor Author

felipecrs commented Sep 10, 2023

I also just realized that --condition=ready=false will filter out pods which does not have a health check, like Jobs.

For example, with --condition=ready=false, logs for a pod like below will not be shown:

apiVersion: batch/v1
kind: Job
metadata:
  name: test-two
spec:
  completions: 1
  template:
    metadata:
      labels:
        app: test-two
    spec:
      terminationGracePeriodSeconds: 1
      restartPolicy: OnFailure
      containers:
      - name: test-two
        image: bash
        command:
          - bash
          - -c
          - |
            count=0
            while true; do
              echo "World Hello: $count"
              count=$((count+1))
              if [[ $count -eq 10 ]]; then
                exit 1
              fi
              sleep 1
            done

Similarly, for a pod which does not have a readinessProbe:

apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: test-two
spec:
  replicas: 1
  selector:
    matchLabels:
      app: test-two
  serviceName: test-two
  template:
    metadata:
      labels:
        app: test-two
    spec:
      terminationGracePeriodSeconds: 1
      containers:
      - name: test-two
        image: bash
        command:
          - bash
          - -c
          - |
            count=0
            while true; do
              echo "World Hello: $count"
              count=$((count+1))
              if [[ $count -eq 10 ]]; then
                exit 0
              fi
              sleep 1
            done

My use case (#213) was to filter out only healthy pods, and simple conditions like that cannot fulfil my needs.

The only solution I thought to circumvent this is: --only-condition-pods-with-readiness. But:

  1. The flag name is ugly as hell (don't you think? or do you have better suggestions?)
  2. I could just simplify this whole thing to --only-unhealthy-pods and drop --condition entirely.

Feedback is appreciated.

felipecrs and others added 2 commits September 10, 2023 13:17
Co-authored-by: Takashi Kusumi <takashi.kusumi@gmail.com>
@felipecrs
Copy link
Contributor Author

felipecrs commented Sep 10, 2023

Another thing, @tksm, I think allowing people to specify --tail more than 0 can be useful. Imagine a situation like:

I want to only display logs for pods which are not ready, but displaying their last 3 lines before they became unhealthy.

@felipecrs felipecrs changed the title Add support for --condition Add --condition Sep 10, 2023
@felipecrs
Copy link
Contributor Author

felipecrs commented Sep 10, 2023

Ok, now it fulfils my use case:

stern . --condition=ready=false --tail=0 --only-condition-pods-with-readiness

Please let me know what you think about it. And, as always, thanks for the help.

@felipecrs
Copy link
Contributor Author

Ok, I have been soaking this internally during the day and the results are really good for my use case.

I would like to ask the maintainers to give me the go ahead on the current solution (--only-condition-pods-with-readiness) before I start cleaning up the PR and working on fixing the tests.

@felipecrs felipecrs marked this pull request as ready for review January 6, 2025 18:03
@felipecrs
Copy link
Contributor Author

@tksm, I have simplified and completed this PR. I believe it's good to be merged. Can you please check it again?

@felipecrs
Copy link
Contributor Author

@superbrothers feel free to take a look at this one too. 😅

@guettli
Copy link
Contributor

guettli commented Jan 7, 2025

@guettli, do you mean stern could use your tool as a library to handle these conditions?

You can use my code as a library, if you want. I am unsure whether the code is actually usable as a library. Up to now this was not on my mind while coding it. Feel free to create an issue or contact my directly, if you have a question!

@felipecrs
Copy link
Contributor Author

Got it. Thank you. For now the current implementation of this PR should be good enough, but I'll keep an eye on it.

Copy link
Contributor

@tksm tksm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, thanks for the implementation! 🎉

I've confirmed that it works as intended with --tail=0. The manifest I used for testing is attached below.

However, I noticed that using --condition without --tail=0 or --no-follow can lead to somewhat confusing behavior when the condition changes repeatedly. Would you consider adding a note to the documentation and help about this? Alternatively, setting --tail=0 as the default when --condition is specified might help clarify things.

Another option is to automatically set TailLines=0 when the pod's log has already been displayed once, ensuring that logs start from the next event.

Confirmation

stern-condition

These are three outputs I confirmed with different options.

  1. without --condition: all logs from the pod are shown
  2. with --condition=ready: only logs are shown when the pod's condition is ready
  3. with --condition=ready=false: only logs are shown when the pod's condition is not ready

Manifest

apiVersion: v1
kind: Pod
metadata:
  name: ready-flipper
spec:
  containers:
  - name: ready-flipper
    image: bash
    command:
      - bash
      - -c
      - |
        for _ in $(seq 2); do
          rm -f /tmp/ready; sleep 3
          for _ in $(seq 3); do echo not ready; sleep 1; done
          touch /tmp/ready; sleep 3
          for _ in $(seq 3); do echo ready; sleep 1; done
        done
        sleep 1000000
    readinessProbe:
      exec:
        command:
        - cat
        - /tmp/ready
      initialDelaySeconds: 1
      periodSeconds: 1
      failureThreshold: 1

@felipecrs
Copy link
Contributor Author

Another option is to automatically set TailLines=0 when the pod's log has already been displayed once, ensuring that logs start from the next event.

@tksm, this would be the best. I'll try to implement it.

@felipecrs
Copy link
Contributor Author

felipecrs commented Jan 10, 2025

Another option is to automatically set TailLines=0 when the pod's log has already been displayed once, ensuring that logs start from the next event.

@tksm, this would be the best. I'll try to implement it.

Or do you have any hint on implementing it? (Or even if you'd like to implement it yourself, please go ahead).

It's like I was brain washed, I'm shocked.

@tksm
Copy link
Contributor

tksm commented Jan 11, 2025

Perhaps this can be achieved by skipping the f.forget() call and adding a new field to targetFilter.targetStates to track whether the pod's log has already been displayed. However, it might be complex, so I'm not sure if it's worth implementing.
(Sorry, I don't have time to implement this myself.)

@felipecrs
Copy link
Contributor Author

felipecrs commented Jan 12, 2025

@tksm thank you very much for the suggestion, but I was still not able to implement anything useful (spent a few hours on it).

So, I just added a flag check for now. Perhaps we can improve this in the future, but I don't think it's worth hold this feature because of it.

@felipecrs felipecrs requested a review from tksm January 12, 2025 19:44
@felipecrs felipecrs requested a review from tksm January 13, 2025 02:58
@felipecrs felipecrs requested a review from tksm January 13, 2025 13:18
Copy link
Contributor

@tksm tksm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 🚀

Thank you very much for the contribution!

@tksm tksm merged commit 2576972 into stern:master Jan 13, 2025
2 checks passed
@felipecrs felipecrs deleted the condition branch January 13, 2025 13:51
@felipecrs
Copy link
Contributor Author

Thank you for patiently helping me complete this PR!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

--exclude-healthy-pods
4 participants