Skip to content

Conversation

kvaps
Copy link
Member

@kvaps kvaps commented May 23, 2025

Signed-off-by: Andrei Kvapil kvapss@gmail.com

Summary by CodeRabbit

  • New Features

    • Introduced separate end-to-end test scripts for cluster and application provisioning, improving test clarity and modularity.
    • Added a new test runner script with enhanced output formatting and live tracing for easier debugging of test runs.
  • Bug Fixes

    • None.
  • Chores

    • Removed the legacy end-to-end test automation script.
    • Updated testing workflow to use new modular test scripts and runner for improved maintainability.

@kvaps kvaps requested review from lllamnyp and klinch0 as code owners May 23, 2025 18:59
Copy link
Contributor

coderabbitai bot commented May 23, 2025

Walkthrough

The end-to-end test process for CozyStack has been refactored from a single shell script to a modular system using Bats test scripts and a custom test runner. The Makefile now runs separate cluster and application tests, with new scripts introduced for each, and the original monolithic script removed.

Changes

File(s) Change Summary
hack/e2e.sh Deleted the comprehensive end-to-end test automation shell script.
hack/e2e-cluster.bats, hack/e2e-apps.bats Added new Bats test scripts for cluster and application end-to-end testing.
hack/cozytest.sh Added a custom shell-based test runner for Bats scripts with enhanced output and tracing.
packages/core/testing/Makefile Refactored test targets: split into test-cluster and test-apps, updated apply env vars.

Sequence Diagram(s)

sequenceDiagram
    participant Dev as Developer
    participant Makefile
    participant CozyTest as hack/cozytest.sh
    participant BatsCluster as e2e-cluster.bats
    participant BatsApps as e2e-apps.bats

    Dev->>Makefile: make test
    Makefile->>CozyTest: Run test-cluster (e2e-cluster.bats)
    CozyTest->>BatsCluster: Execute all cluster E2E tests
    Makefile->>CozyTest: Run test-apps (e2e-apps.bats)
    CozyTest->>BatsApps: Execute all application E2E tests
Loading

Suggested labels

ok-to-test

Suggested reviewers

  • lllamnyp
  • klinch0

Poem

A rabbit hops through lines anew,
The old script’s gone, split into two.
With Bats and cozytest in tow,
The cluster and apps now each have their show.
Modular tests, clear and bright—
CozyStack’s future is looking right!
🐇✨

✨ Finishing Touches
  • 📝 Generate Docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Explain this complex logic.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai explain this code block.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and explain its main purpose.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@kvaps kvaps marked this pull request as draft May 23, 2025 18:59
@kvaps kvaps force-pushed the test-framework branch from 1559aa5 to 0bb55aa Compare May 23, 2025 19:00
@kvaps kvaps force-pushed the test-framework branch from 0bb55aa to 8514799 Compare May 24, 2025 13:35
@kvaps kvaps force-pushed the test-framework branch from 8514799 to 1f498ad Compare May 24, 2025 13:45
@kvaps kvaps marked this pull request as ready for review May 24, 2025 13:46
@kvaps kvaps force-pushed the test-framework branch from 1f498ad to d02f6bf Compare May 24, 2025 13:47
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🧹 Nitpick comments (3)
hack/run_test.sh (2)

3-3: Replace non-ASCII characters in comments.

The comment contains Russian text which might cause encoding issues in some environments.

-#  run_test.sh — POSIX-sh test-runner с live-trace и корректным RC-выходом    #
+#  run_test.sh - POSIX-sh test-runner with live-trace and correct RC-exit     #

26-26: Translate Russian comments to English for better compatibility.

Several inline comments contain Russian text which might cause encoding issues.

-  skip_next="+ $fn"      # первую строку трассировки с именем функции пропустим
+  skip_next="+ $fn"      # skip first trace line with function name

           '__RC__'*) : ;;                        # маркер не выводим
+          '__RC__'*) : ;;                        # don't output marker

Also applies to: 37-42

hack/e2e.bats (1)

27-30: Consider checking file existence before reading.

While the current approach works with || true, it would be cleaner to check if PID files exist first.

 @test "Clean previous VMs" {
- kill $(cat srv1/qemu.pid srv2/qemu.pid srv3/qemu.pid 2>/dev/null) 2>/dev/null || true
+ for i in 1 2 3; do
+   [ -f "srv${i}/qemu.pid" ] && kill "$(cat "srv${i}/qemu.pid")" 2>/dev/null || true
+ done
  rm -rf srv1 srv2 srv3
 }
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a8ba6b1 and d02f6bf.

📒 Files selected for processing (4)
  • hack/e2e.bats (1 hunks)
  • hack/e2e.sh (0 hunks)
  • hack/run_test.sh (1 hunks)
  • packages/core/testing/Makefile (1 hunks)
💤 Files with no reviewable changes (1)
  • hack/e2e.sh
⏰ Context from checks skipped due to timeout of 90000ms (1)
  • GitHub Check: Build and Test
🔇 Additional comments (2)
packages/core/testing/Makefile (1)

34-34: LGTM! Proper integration of the new test framework.

The Makefile correctly invokes the new test runner with the Bats test file, maintaining the same environment setup.

hack/e2e.bats (1)

234-385: Excellent comprehensive test coverage!

The test suite provides thorough end-to-end validation of the Cozystack provisioning process, including:

  • Proper timeouts for async operations
  • Appropriate error handling with reconciliation retries
  • Verification of all major components (storage, networking, monitoring, OIDC)

@kvaps kvaps changed the title Introduce new test framework [tests] Introduce new test framework May 24, 2025
@kvaps kvaps changed the title [tests] Introduce new test framework [tests] Introduce new testing framework May 24, 2025
@kvaps kvaps force-pushed the test-framework branch 6 times, most recently from 61a4536 to b107d80 Compare May 24, 2025 21:40
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (6)
hack/e2e-apps.bats (2)

3-3: Replace non-ASCII hyphen with standard ASCII hyphen.

The comment contains a non-ASCII hyphen character (‑) which could cause issues in some environments.

-# Cozystack end‑to‑end provisioning test (Bats)
+# Cozystack end-to-end provisioning test (Bats)

87-87: Remove redundant namespace status check.

The namespace tenant-test was already verified to be Active in the previous test. This check is redundant unless tests can be run independently.

If tests are meant to be run independently, consider adding a comment explaining this. Otherwise, remove this line:

-  kubectl wait namespace tenant-test --timeout=20s --for=jsonpath='{.status.phase}'=Active
hack/e2e-cluster.bats (4)

3-3: Replace non-ASCII hyphen with standard ASCII hyphen.

The comment contains a non-ASCII hyphen character (‑) which could cause issues in some environments.

-# Cozystack end‑to‑end provisioning test (Bats)
+# Cozystack end-to-end provisioning test (Bats)

59-59: Consider using consistent subnet mask.

The VMs are configured with /26 subnet mask while the bridge uses /24. This inconsistency might cause confusion.

For a 3-node test setup, using /24 throughout would be simpler and more consistent:

-      - "192.168.123.1${i}/26"
+      - "192.168.123.1${i}/24"

110-112: Consider more robust VM startup verification.

The fixed 5-second sleep might not be sufficient in all environments (slow systems, high load, etc.).

Since the next test already waits for the Talos API port, consider removing the sleep or replacing it with a network readiness check:

-  # Give qemu a few seconds to start up networking
-  sleep 5
+  # Network readiness will be verified in the next test

266-267: Improve readability by splitting the long kubectl wait command.

The command line is very long and hard to read.

Consider using a loop or splitting into multiple wait commands:

# Wait for all deployments
deployments=(
  "capi-controller-manager"
  "capi-kamaji-controller-manager"
  "capi-kubeadm-bootstrap-controller-manager"
  "capi-operator-cluster-api-operator"
  "capk-controller-manager"
)

for deploy in "${deployments[@]}"; do
  kubectl wait "deployment/${deploy}" -n cozy-cluster-api --timeout=1m --for=condition=available
done
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 326870a and b107d80.

📒 Files selected for processing (5)
  • hack/e2e-apps.bats (1 hunks)
  • hack/e2e-cluster.bats (1 hunks)
  • hack/e2e.sh (0 hunks)
  • hack/run_test.sh (1 hunks)
  • packages/core/testing/Makefile (2 hunks)
💤 Files with no reviewable changes (1)
  • hack/e2e.sh
🚧 Files skipped from review as they are similar to previous changes (2)
  • packages/core/testing/Makefile
  • hack/run_test.sh
⏰ Context from checks skipped due to timeout of 90000ms (1)
  • GitHub Check: Build and Test
🔇 Additional comments (2)
hack/e2e-apps.bats (1)

6-24: Test implementation looks good!

The tenant creation test is well-structured with appropriate resource definition and reasonable timeout values for waiting on HelmRelease and namespace readiness.

hack/e2e-cluster.bats (1)

1-384: Excellent comprehensive test implementation!

The test suite is well-structured with:

  • Logical progression from setup to deployment
  • Good error handling and informative messages
  • Appropriate use of timeouts and wait conditions
  • Idempotent operations where needed
  • End-to-end verification including HTTP checks

This provides a solid foundation for automated testing of the Cozystack cluster provisioning process.

@kvaps kvaps force-pushed the test-framework branch from b107d80 to 8ac7143 Compare May 24, 2025 22:02
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (3)
hack/e2e-cluster.bats (3)

75-76: Progress output redirection may cause issues.

The combination of --output-file /dev/stdout and 2>/dev/null in the wget command could suppress important error messages while trying to redirect progress output.

Consider simplifying the progress handling:

-wget https://github.com/cozystack/cozystack/releases/latest/download/nocloud-amd64.raw.xz \
-  -O nocloud-amd64.raw.xz --show-progress --output-file /dev/stdout --progress=dot:giga 2>/dev/null
+wget https://github.com/cozystack/cozystack/releases/latest/download/nocloud-amd64.raw.xz \
+  -O nocloud-amd64.raw.xz --progress=dot:giga

101-107: Consider making VM resource allocation configurable.

The VM resources (8 CPUs, 16GB RAM) are hardcoded, which may not be suitable for all testing environments.

Consider using environment variables for resource configuration:

+# Set default values if not provided
+: ${QEMU_CPUS:=8}
+: ${QEMU_MEMORY:=16384}
+
qemu-system-x86_64 -machine type=pc,accel=kvm -cpu host -smp ${QEMU_CPUS} -m ${QEMU_MEMORY} \

115-115: Enhance timeout error message for better debugging.

The current timeout command doesn't provide clear information about which specific node failed to respond.

Consider adding more descriptive error handling:

-timeout 60 sh -ec 'until nc -nz 192.168.123.11 50000 && nc -nz 192.168.123.12 50000 && nc -nz 192.168.123.13 50000; do sleep 1; done'
+timeout 60 sh -ec '
+  until nc -nz 192.168.123.11 50000 && nc -nz 192.168.123.12 50000 && nc -nz 192.168.123.13 50000; do 
+    echo "Waiting for Talos API on nodes 11, 12, 13..." >&2
+    sleep 1
+  done'
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b107d80 and 8ac7143.

📒 Files selected for processing (3)
  • hack/e2e-apps.bats (1 hunks)
  • hack/e2e-cluster.bats (1 hunks)
  • packages/core/testing/Makefile (2 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
  • hack/e2e-apps.bats
  • packages/core/testing/Makefile
⏰ Context from checks skipped due to timeout of 90000ms (2)
  • GitHub Check: pre-commit
  • GitHub Check: Build and Test
🔇 Additional comments (2)
hack/e2e-cluster.bats (2)

258-261: Excellent error handling pattern for HelmRelease validation.

This is a good example of failing fast when critical components don't reconcile properly, with clear error messaging.


1-384: Well-structured comprehensive testing framework.

This Bats script provides excellent coverage of the Cozystack cluster provisioning lifecycle. The modular approach with individual test cases for each component makes it easy to debug failures and understand the setup process.

Key strengths:

  • Comprehensive coverage from VM setup to application verification
  • Good use of timeouts and retries for async operations
  • Proper cleanup and preparation steps
  • Clear test case organization

Comment on lines +350 to +353
if ! kubectl wait hr/monitoring -n tenant-root --timeout=2m --for=condition=ready; then
flux reconcile hr monitoring -n tenant-root --force
kubectl wait hr/monitoring -n tenant-root --timeout=2m --for=condition=ready
fi
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Verification agent

🧩 Analysis chain

Force reconciliation suggests potential instability.

The fallback to flux reconcile hr monitoring -n tenant-root --force suggests the monitoring HelmRelease may have intermittent issues that require manual intervention.

This pattern might indicate an underlying issue with the monitoring component. Consider investigating why force reconciliation is needed:


🏁 Script executed:

#!/bin/bash
# Search for similar force reconciliation patterns in the codebase
rg -A 5 -B 5 "flux reconcile.*--force"

Length of output: 944


Investigate the need for forced reconciliation

The flux reconcile hr monitoring … --force call in hack/e2e-cluster.bats is the only occurrence of a forced reconciliation in our test suite. Relying on --force can mask intermittent failures of the monitoring HelmRelease. Rather than suppressing these flakes, let’s:

  • Identify why kubectl wait hr/monitoring occasionally times out (e.g., resource contention, misconfiguration, insufficient timeout).
  • Consider increasing the --timeout or adding readiness checks to make the test more robust.
  • If --force remains necessary, document the root cause and rationale in a comment so future maintainers understand why this exception exists.
🤖 Prompt for AI Agents
In hack/e2e-cluster.bats around lines 350 to 353, the script uses `flux
reconcile hr monitoring --force` as a fallback when `kubectl wait hr/monitoring`
times out, which can hide intermittent issues. To fix this, first investigate
why the wait command times out by checking for resource contention,
misconfiguration, or insufficient timeout duration. Increase the `--timeout`
value or add additional readiness checks to improve stability. If the forced
reconciliation is still required, add a detailed comment explaining the root
cause and why this exception is necessary for future maintainers.

}

@test "Clean previous VMs" {
kill $(cat srv1/qemu.pid srv2/qemu.pid srv3/qemu.pid 2>/dev/null) 2>/dev/null || true
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Unsafe process termination could affect unrelated processes.

The kill $(cat srv1/qemu.pid srv2/qemu.pid srv3/qemu.pid 2>/dev/null) command could potentially kill unrelated processes if the PID files contain stale PIDs that have been reused by the system.

Consider adding process validation before termination:

-kill $(cat srv1/qemu.pid srv2/qemu.pid srv3/qemu.pid 2>/dev/null) 2>/dev/null || true
+for pidfile in srv1/qemu.pid srv2/qemu.pid srv3/qemu.pid; do
+  if [ -f "$pidfile" ]; then
+    pid=$(cat "$pidfile" 2>/dev/null || echo "")
+    if [ -n "$pid" ] && kill -0 "$pid" 2>/dev/null; then
+      kill "$pid" 2>/dev/null || true
+    fi
+  fi
+done
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
kill $(cat srv1/qemu.pid srv2/qemu.pid srv3/qemu.pid 2>/dev/null) 2>/dev/null || true
for pidfile in srv1/qemu.pid srv2/qemu.pid srv3/qemu.pid; do
if [ -f "$pidfile" ]; then
pid=$(cat "$pidfile" 2>/dev/null || echo "")
if [ -n "$pid" ] && kill -0 "$pid" 2>/dev/null; then
kill "$pid" 2>/dev/null || true
fi
fi
done
🤖 Prompt for AI Agents
In hack/e2e-cluster.bats at line 27, the current kill command uses PIDs from
files without validation, risking termination of unrelated processes if PIDs are
stale. To fix this, read each PID from the files, verify that the process
corresponds to the expected QEMU instance (e.g., by checking the command line or
process name), and only then issue the kill command. This ensures only the
intended processes are terminated safely.

@kvaps kvaps force-pushed the test-framework branch from 8ac7143 to 4c36498 Compare May 24, 2025 23:35
kvaps added 2 commits May 25, 2025 09:29
Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
@kvaps kvaps changed the title [tests] Introduce new testing framework [tests] Introduce cozytest - a new testing framework May 25, 2025
@kvaps kvaps force-pushed the test-framework branch from 4c36498 to 74e7e5c Compare May 25, 2025 07:31
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

♻️ Duplicate comments (2)
hack/e2e-cluster.bats (2)

350-353: Force reconciliation still present from previous review

The forced reconciliation pattern was flagged in a previous review as potentially masking underlying issues.

As noted in the previous review, investigate why the monitoring HelmRelease occasionally requires force reconciliation instead of relying on this workaround.


27-27: ⚠️ Potential issue

Address unsafe process termination from previous review

This issue was flagged in a previous review but remains unresolved. The kill command could terminate unrelated processes if PIDs are stale.

Apply the suggested fix from the previous review:

-kill $(cat srv1/qemu.pid srv2/qemu.pid srv3/qemu.pid 2>/dev/null) 2>/dev/null || true
+for pidfile in srv1/qemu.pid srv2/qemu.pid srv3/qemu.pid; do
+  if [ -f "$pidfile" ]; then
+    pid=$(cat "$pidfile" 2>/dev/null || echo "")
+    if [ -n "$pid" ] && kill -0 "$pid" 2>/dev/null; then
+      kill "$pid" 2>/dev/null || true
+    fi
+  fi
+done
🧹 Nitpick comments (5)
hack/cozytest.sh (4)

22-23: Improve error handling for temporary directory creation

The error handling is good, but the cleanup should be more robust.

 run_one() {
   fn=$1 title=$2
-  tmp=$(mktemp -d) || { echo "Failed to create temp directory" >&2; exit 1; }
+  tmp=$(mktemp -d) || { echo "Failed to create temp directory" >&2; exit 1; }
+  trap 'rm -rf "$tmp"' EXIT
   log="$tmp/log"

27-27: Remove non-English comment

The comment contains non-English text which should be translated for consistency.

-  skip_next="+ $fn"      # первую строку трассировки с именем функции пропустим
+  skip_next="+ $fn"      # skip the first trace line with function name

86-86: Remove non-English comment

Another non-English comment that should be translated.

-    print "  set -e"           # ошибка → падение теста
+    print "  set -e"           # error causes test failure

90-90: Remove non-English comment

Another non-English comment that should be translated.

-    print "  return 0"         # если автор не сделал exit 1 — тест ОК
+    print "  return 0"         # if author didn't exit 1, test is OK
hack/e2e-cluster.bats (1)

233-262: Consider making timeouts configurable

The hard-coded timeouts throughout the script (60s, 15m, 2m, etc.) might not be suitable for all environments. Consider making these configurable via environment variables.

Example for the HelmRelease wait:

+ HELM_TIMEOUT=${HELM_TIMEOUT:-15m}
- kubectl get hr -A | awk 'NR>1 {print "kubectl wait --timeout=15m --for=condition=ready -n "$1" hr/"$2" &"} END {print "wait"}' | sh -ex
+ kubectl get hr -A | awk -v timeout="$HELM_TIMEOUT" 'NR>1 {print "kubectl wait --timeout="timeout" --for=condition=ready -n "$1" hr/"$2" &"} END {print "wait"}' | sh -ex
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 4c36498 and 74e7e5c.

📒 Files selected for processing (5)
  • hack/cozytest.sh (1 hunks)
  • hack/e2e-apps.bats (1 hunks)
  • hack/e2e-cluster.bats (1 hunks)
  • hack/e2e.sh (0 hunks)
  • packages/core/testing/Makefile (2 hunks)
💤 Files with no reviewable changes (1)
  • hack/e2e.sh
🚧 Files skipped from review as they are similar to previous changes (2)
  • hack/e2e-apps.bats
  • packages/core/testing/Makefile
⏰ Context from checks skipped due to timeout of 90000ms (1)
  • GitHub Check: Build and Test
🔇 Additional comments (5)
hack/cozytest.sh (3)

1-6: LGTM: Clean header and strict error handling

The script header follows good practices with proper shebang and strict error handling via set -eu.


36-50: Consider security implications of command execution tracing

The live tracing output could potentially expose sensitive information (passwords, tokens) that might be used in test commands. Consider adding a mechanism to sanitize or filter sensitive data from the trace output.

Are there any sensitive credentials or tokens that might be exposed in the test trace output? If so, consider implementing output sanitization.


75-95: Robust AWK parsing for Bats conversion

The AWK script correctly handles the conversion from Bats syntax to shell functions, with proper escaping and sanitization of function names.

hack/e2e-cluster.bats (2)

6-14: LGTM: Clear environment validation

Good practice to validate required environment variables with helpful error messages.


376-379: Good improvement: Enhanced curl error handling

This appears to address the curl error handling issue from the previous review with proper error checking and informative messages.

Comment on lines +8 to +9
TEST_FILE=${1:?Usage: ./cozytest.sh <file.bats> [pattern]}
PATTERN=${2:-*}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Add validation for TEST_FILE existence

The script should verify that the provided Bats file exists before proceeding.

 TEST_FILE=${1:?Usage: ./cozytest.sh <file.bats> [pattern]}
+[ -f "$TEST_FILE" ] || { echo "Error: Test file '$TEST_FILE' not found" >&2; exit 1; }
 PATTERN=${2:-*}
🤖 Prompt for AI Agents
In hack/cozytest.sh around lines 8 to 9, add a check after assigning TEST_FILE
to verify that the file actually exists on the filesystem. If the file does not
exist, print an error message indicating the missing file and exit the script
with a non-zero status to prevent further execution.

Comment on lines +38 to +39
iptables -t nat -D POSTROUTING -s 192.168.123.0/24 ! -d 192.168.123.0/24 -j MASQUERADE 2>/dev/null || true
iptables -t nat -A POSTROUTING -s 192.168.123.0/24 ! -d 192.168.123.0/24 -j MASQUERADE
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Add validation for iptables rule management

The iptables commands could fail on systems where iptables is not available or when permissions are insufficient.

+  # Check if iptables is available
+  if ! command -v iptables >/dev/null 2>&1; then
+    echo "iptables command not found. Please install iptables." >&2
+    exit 1
+  fi
+
   # Masquerading rule – idempotent (delete first, then add)
   iptables -t nat -D POSTROUTING -s 192.168.123.0/24 ! -d 192.168.123.0/24 -j MASQUERADE 2>/dev/null || true
-  iptables -t nat -A POSTROUTING -s 192.168.123.0/24 ! -d 192.168.123.0/24 -j MASQUERADE
+  if ! iptables -t nat -A POSTROUTING -s 192.168.123.0/24 ! -d 192.168.123.0/24 -j MASQUERADE; then
+    echo "Failed to add iptables masquerading rule. Check permissions." >&2
+    exit 1
+  fi
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
iptables -t nat -D POSTROUTING -s 192.168.123.0/24 ! -d 192.168.123.0/24 -j MASQUERADE 2>/dev/null || true
iptables -t nat -A POSTROUTING -s 192.168.123.0/24 ! -d 192.168.123.0/24 -j MASQUERADE
# Check if iptables is available
if ! command -v iptables >/dev/null 2>&1; then
echo "iptables command not found. Please install iptables." >&2
exit 1
fi
# Masquerading rule – idempotent (delete first, then add)
iptables -t nat -D POSTROUTING -s 192.168.123.0/24 ! -d 192.168.123.0/24 -j MASQUERADE 2>/dev/null || true
if ! iptables -t nat -A POSTROUTING -s 192.168.123.0/24 ! -d 192.168.123.0/24 -j MASQUERADE; then
echo "Failed to add iptables masquerading rule. Check permissions." >&2
exit 1
fi
🤖 Prompt for AI Agents
In hack/e2e-cluster.bats around lines 38 to 39, the iptables commands lack
validation and may fail silently if iptables is not installed or if permissions
are insufficient. Add checks to verify that iptables is available on the system
before running these commands, and handle errors gracefully by checking command
exit statuses and providing informative error messages or fallback behavior to
avoid script failures.

Comment on lines +101 to +108
qemu-system-x86_64 -machine type=pc,accel=kvm -cpu host -smp 8 -m 16384 \
-device virtio-net,netdev=net0,mac=52:54:00:12:34:5${i} \
-netdev tap,id=net0,ifname=cozy-srv${i},script=no,downscript=no \
-drive file=srv${i}/system.img,if=virtio,format=raw \
-drive file=srv${i}/seed.img,if=virtio,format=raw \
-drive file=srv${i}/data.img,if=virtio,format=raw \
-display none -daemonize -pidfile srv${i}/qemu.pid
done
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Add validation for QEMU command success

The QEMU commands should be validated to ensure VMs start successfully.

   for i in 1 2 3; do
-    qemu-system-x86_64 -machine type=pc,accel=kvm -cpu host -smp 8 -m 16384 \
+    if ! qemu-system-x86_64 -machine type=pc,accel=kvm -cpu host -smp 8 -m 16384 \
       -device virtio-net,netdev=net0,mac=52:54:00:12:34:5${i} \
       -netdev tap,id=net0,ifname=cozy-srv${i},script=no,downscript=no \
       -drive file=srv${i}/system.img,if=virtio,format=raw \
       -drive file=srv${i}/seed.img,if=virtio,format=raw \
       -drive file=srv${i}/data.img,if=virtio,format=raw \
-      -display none -daemonize -pidfile srv${i}/qemu.pid
+      -display none -daemonize -pidfile srv${i}/qemu.pid; then
+      echo "Failed to start QEMU VM srv${i}" >&2
+      exit 1
+    fi
   done
🤖 Prompt for AI Agents
In hack/e2e-cluster.bats around lines 101 to 108, the QEMU commands launching
VMs lack validation for success. Modify the script to check the exit status of
each qemu-system-x86_64 command after execution. If the command fails, output an
error message indicating which VM failed to start and exit the script or handle
the failure appropriately to prevent silent errors.

Comment on lines +75 to +79
wget https://github.com/cozystack/cozystack/releases/latest/download/nocloud-amd64.raw.xz \
-O nocloud-amd64.raw.xz --show-progress --output-file /dev/stdout --progress=dot:giga 2>/dev/null
rm -f nocloud-amd64.raw
xz --decompress nocloud-amd64.raw.xz
fi
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Add integrity verification for downloaded Talos image

The downloaded image should be verified for integrity to ensure it hasn't been tampered with.

   if [ ! -f nocloud-amd64.raw ]; then
     wget https://github.com/cozystack/cozystack/releases/latest/download/nocloud-amd64.raw.xz \
       -O nocloud-amd64.raw.xz --show-progress --output-file /dev/stdout --progress=dot:giga 2>/dev/null
+    # Verify checksum if available
+    if wget -q --spider https://github.com/cozystack/cozystack/releases/latest/download/nocloud-amd64.raw.xz.sha256; then
+      wget -q https://github.com/cozystack/cozystack/releases/latest/download/nocloud-amd64.raw.xz.sha256
+      if ! sha256sum -c nocloud-amd64.raw.xz.sha256; then
+        echo "Checksum verification failed for downloaded image" >&2
+        exit 1
+      fi
+    fi
     rm -f nocloud-amd64.raw
     xz --decompress nocloud-amd64.raw.xz
   fi
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
wget https://github.com/cozystack/cozystack/releases/latest/download/nocloud-amd64.raw.xz \
-O nocloud-amd64.raw.xz --show-progress --output-file /dev/stdout --progress=dot:giga 2>/dev/null
rm -f nocloud-amd64.raw
xz --decompress nocloud-amd64.raw.xz
fi
if [ ! -f nocloud-amd64.raw ]; then
wget https://github.com/cozystack/cozystack/releases/latest/download/nocloud-amd64.raw.xz \
-O nocloud-amd64.raw.xz --show-progress --output-file /dev/stdout --progress=dot:giga 2>/dev/null
# Verify checksum if available
if wget -q --spider https://github.com/cozystack/cozystack/releases/latest/download/nocloud-amd64.raw.xz.sha256; then
wget -q https://github.com/cozystack/cozystack/releases/latest/download/nocloud-amd64.raw.xz.sha256
if ! sha256sum -c nocloud-amd64.raw.xz.sha256; then
echo "Checksum verification failed for downloaded image" >&2
exit 1
fi
fi
rm -f nocloud-amd64.raw
xz --decompress nocloud-amd64.raw.xz
fi
🤖 Prompt for AI Agents
In hack/e2e-cluster.bats around lines 75 to 79, the downloaded Talos image is
not verified for integrity, which risks using a tampered file. After downloading
the nocloud-amd64.raw.xz file, add a step to verify its checksum or signature
against a trusted source before decompressing. This can be done by downloading a
checksum file or signature from the same release and using tools like sha256sum
or gpg to validate the image's authenticity and integrity.

@kvaps kvaps merged commit 50932ba into main May 28, 2025
6 of 8 checks passed
@kvaps kvaps deleted the test-framework branch May 28, 2025 07:19
This was referenced Jun 8, 2025
@coderabbitai coderabbitai bot mentioned this pull request Aug 21, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants