Skip to content

Conversation

joamaki
Copy link
Contributor

@joamaki joamaki commented Oct 1, 2024

Cilium shell is a new simpler way for implementing debugging abilities for the cilium-agent based on a simple command language. The same command language can also be used to implement complex integration tests. By sharing the same language for debugging and testing it becomes easier to write complex integration tests (as the commands become familiar from debugging) and it makes sure our debugging commands are also tested. The simple way of adding commands (compared to editing openapi.yaml and modifying cilium-dbg) also significantly lowers the barrier for adding debug tooling and also as an added bonus reduces the maintenance overhead of Cilium derivatives.

See the "asciinema" links at the bottom for demos!

Commands can be added by any cell by calling cell.Provide with hive.NewScriptCmd:

  var Cell = cell.Module(
    ...
    cell.Provide(newScriptCommand),
  )

  func newScriptCommand(t *Thing) hive.ScriptCmdOut {
    return hive.NewScriptCmd(
      "thing",
      script.Command(
        script.CmdUsage{Summary: "Inspect the thing"},
          func(s *script.State, args ...string) (script.WaitFunc, error) {
            return func(*script.State) (stdout, stderr string, err error) {
              stdout = fmt.Sprintf("Thing status: %s\n", t.Status())
                return
              }, nil
	  },
       ),
    )
  }

This can be now used in "cilium-dbg shell":

  $ kubectl exec -it -n kube-system ds/cilium -- cilium-dbg shell
  ...
  cilium> thing
  [stdout]
  Thing status: ...

  $ kubectl exec -it -n kube-system ds/cilium -- cilium-dbg shell thing
  Thing status: ...
  $ 

The same command can be used in an integration test:

  func TestThing(t *testing.T) {
    ctx := context.Background()
    log := hivetest.Logger(t)
    h := hive.New(thing.Cell, <deps>...)
      engine := &script.Engine{
        Cmds: h.ScriptCommands(log),
      }
      require.NoError(t, h.Start(log, ctx), "Start")
      t.Cleanup(func() {
        assert.NoError(t, h.Stop(log, ctx), "Stop")
      })
      scripttest.Test(
        t,
        ctx,
        func() *script.Engine { return engine },
        []string{},
	"testdata/*.txtar",
    )
  }

An example test script "testdata/test.txtar" could now look something like:

  thing
  grep ^Thing status: OK$

More information about this can be found from:

@maintainer-s-little-helper maintainer-s-little-helper bot added the dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. label Oct 1, 2024
@joamaki joamaki force-pushed the pr/joamaki/hivescript branch 3 times, most recently from cbc0982 to b361d0f Compare October 2, 2024 12:35
@joamaki joamaki force-pushed the pr/joamaki/hivescript branch 4 times, most recently from 7adef17 to 6bedb3f Compare October 11, 2024 09:49
@joamaki joamaki added the release-note/misc This PR makes changes that have no direct user impact. label Oct 11, 2024
@maintainer-s-little-helper maintainer-s-little-helper bot removed the dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. label Oct 11, 2024
@joamaki joamaki changed the title DRAFT: cilium shell Hive scripts and the cilium shell Oct 11, 2024
@joamaki joamaki force-pushed the pr/joamaki/hivescript branch from 6bedb3f to df77ad1 Compare October 11, 2024 10:05
@joamaki
Copy link
Contributor Author

joamaki commented Oct 11, 2024

/test

@joamaki joamaki marked this pull request as ready for review October 11, 2024 10:08
@joamaki joamaki requested review from a team as code owners October 11, 2024 10:08
@joamaki
Copy link
Contributor Author

joamaki commented Oct 11, 2024

Here's some of the follow-ups on this that I have in mind:

  • Reimplement the "cilium-dbg statedb" command via the "db" command. This removes the need to explicitly mention each table and to import the package with the object type (waiting on script: Add watch command statedb#60).
  • Move some of the openapi.yaml endpoints into commands to see how much this would allow us to simplify the codebase.
  • Rewrite devices_controller_test.go as a script test as an example of a low-level datapath integration test in this style. "netlink" command for link management.
  • Document this facility in the cilium development docs. Document how to test StateDB-based components using test scripts.
  • Add commands for the common facilities like metrics so tests can be written that assert metrics (old impl. in https://github.com/joamaki/cilium/blob/pr/joamaki/experimental-ciliumenvoyconfig/pkg/metrics/testscript.go)

Copy link
Contributor

@learnitall learnitall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great idea! I'm excited about this. I read through the blog post and the rest of the links that you added and I'm on board with the vision. I have a couple of questions for you, but they may be more appropriate for discussion in slack:

  • One of the core concepts that was mentioned in the blog post was the idea that these tests are able to be sandboxed and can be executed in parallel. Is this a goal for use with Cilium? If so, how will tests that rely on datapath state be implemented?
  • What is the migration plan for existing tests, if any? Will this framework be expected as the primary method for testing hive cells, while other integration test frameworks that Cilium currently uses will be utilized for non-modularized code?
  • IIUC the framework is not reliant on a kubernetes cluster, therefore coverage for integration with kubernetes or other external APIs is missing. Is this a future use case that the framework will handle? Or will this type of testing continue to be handled by existing integration testing code?

Copy link
Contributor

@ovidiutirla ovidiutirla left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking forward for features to integrate this.

@joamaki
Copy link
Contributor Author

joamaki commented Oct 21, 2024

Great idea! I'm excited about this. I read through the blog post and the rest of the links that you added and I'm on board with the vision. I have a couple of questions for you, but they may be more appropriate for discussion in slack:

  • One of the core concepts that was mentioned in the blog post was the idea that these tests are able to be sandboxed and can be executed in parallel. Is this a goal for use with Cilium? If so, how will tests that rely on datapath state be implemented?

Yep, scripttest by default runs each test file in parallel. If we use this for "privileged" tests we would need to make sure each test gets a clean environment. For example I'm going to use these to test the experimental load-balancing which populates bunch of BPF maps and there I'll use unpinned maps, and I might rewrite devices_controller_test.go in which I'd just create new temporary network namespace for each test.

  • What is the migration plan for existing tests, if any? Will this framework be expected as the primary method for testing hive cells, while other integration test frameworks that Cilium currently uses will be utilized for non-modularized code?

None at the moment. This isn't (yet) trying to be the primary way of testing hive cells, but just one of the options for more complicated tests. I'd get some more experience first with this and then see where it makes sense to apply it or migrate to it.

  • IIUC the framework is not reliant on a kubernetes cluster, therefore coverage for integration with kubernetes or other external APIs is missing. Is this a future use case that the framework will handle? Or will this type of testing continue to be handled by existing integration testing code?

My main reason for looking into test scripts was an integration test that inserted many k8s objects, so this is actually the primary use-case I had in mind. The code for this is coming right after this PR, e.g. commands for messing with the client-go's fake client to populate the object trackers.

@joamaki joamaki force-pushed the pr/joamaki/hivescript branch from df77ad1 to c8a3c51 Compare October 21, 2024 12:18
@joamaki
Copy link
Contributor Author

joamaki commented Oct 21, 2024

/test

@joamaki joamaki force-pushed the pr/joamaki/hivescript branch 2 times, most recently from e6ab120 to 8a338b9 Compare October 21, 2024 16:27
@joamaki
Copy link
Contributor Author

joamaki commented Oct 21, 2024

/test

@joamaki joamaki enabled auto-merge October 21, 2024 16:28
@joamaki joamaki force-pushed the pr/joamaki/hivescript branch from 8a338b9 to 30d0d68 Compare October 23, 2024 06:55
@joamaki joamaki requested a review from tommyp1ckles October 23, 2024 06:55
@joamaki
Copy link
Contributor Author

joamaki commented Oct 23, 2024

/test

Cilium shell is a new simpler way for implementing debugging abilities
for the cilium-agent based on a simple script language. The same scripting
language can also be used to implement complex integration tests. By sharing
the same language for debugging and testing it becomes easier to write complex
integration tests (as the commands become familiar from debugging) and it makes
sure our debugging commands are actually tested.

Commands can be added by any cell by calling cell.Provide with hive.NewScriptCmd:

  var Cell = cell.Module(
    ...
    cell.Provide(newScriptCommand),
  )

  func newScriptCommand(t *Thing) hive.ScriptCmdOut {
    return hive.NewScriptCmd(
		"thing",
		script.Command(
			script.CmdUsage{Summary: "Inspect the thing"},
			func(s *script.State, args ...string) (script.WaitFunc, error) {
			  return func(*script.State) (stdout, stderr string, err error) {
			    stdout = fmt.Sprintf("Thing status: %s\n", t.Status())
				return
 			  }, nil
			},
		),
	)
  }

This can be now used in "cilium-dbg shell":

  $ kubectl exec -it -n kube-system ds/cilium -- cilium-dbg shell
  ...
  cilium> thing
  [stdout]
  Thing status: ...

The same command can be used in an integration test:

  func TestThing(t *testing.T) {
    ctx := context.Background()
    log := hivetest.Logger(t)
    h := hive.New(thing.Cell, <deps>...)
	engine := &script.Engine{
      Cmds: h.ScriptCommands(log),
	}
	require.NoError(t, h.Start(log, ctx), "Start")
	t.Cleanup(func() {
		assert.NoError(t, h.Stop(log, ctx), "Stop")
	})
    scripttest.Test(
	  t,
	  context.Background(),
	  func() *script.Engine { return engine },
	  []string{},
	  "testdata/*.txtar",
	)
  }

An example test script "testdata/test.txtar" count now look something like:

  thing
  grep '^Thing status: OK$'

Signed-off-by: Jussi Maki <jussi@isovalent.com>
@joamaki joamaki force-pushed the pr/joamaki/hivescript branch from 30d0d68 to 19ccb0a Compare October 23, 2024 07:39
@joamaki
Copy link
Contributor Author

joamaki commented Oct 23, 2024

/test

@joamaki joamaki requested review from tommyp1ckles and removed request for pippolo84 and tommyp1ckles October 23, 2024 10:26
Copy link
Member

@pippolo84 pippolo84 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! 🚀

@joamaki joamaki requested review from tommyp1ckles and removed request for tommyp1ckles October 23, 2024 15:03
@joamaki joamaki added this pull request to the merge queue Oct 23, 2024
@maintainer-s-little-helper maintainer-s-little-helper bot added the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label Oct 23, 2024
Merged via the queue into cilium:main with commit 897fcb7 Oct 23, 2024
67 checks passed
@joamaki joamaki deleted the pr/joamaki/hivescript branch October 23, 2024 17:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ready-to-merge This PR has passed all tests and received consensus from code owners to merge. release-note/misc This PR makes changes that have no direct user impact.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants