Execution and Partition-Level Environment Variables #4801

wdbaruni · 2025-01-08T16:33:34Z

This PR introduces two key features:

1. Enhanced Execution Environment Variables

Added support for passing rich job metadata to execution engines via environment variables including:

BACALHAU_PARTITION_INDEX: Current partition index (0 to N-1)
BACALHAU_PARTITION_COUNT: Total number of partitions
BACALHAU_JOB_ID: Unique job identifier
BACALHAU_JOB_NAME: User-provided job name
BACALHAU_JOB_NAMESPACE: Job namespace
BACALHAU_JOB_TYPE: Job type (Batch/Service)
BACALHAU_EXECUTION_ID: Unique execution identifier
BACALHAU_NODE_ID: ID of executing compute node

This allows jobs to:

Be partition-aware and handle their specific partition's work
Access their execution context
Track node assignment

2. Test Suite for Partition Scheduling

Added comprehensive test suite that validates:

Environment variable propagation to executors
Partition scheduling behavior:
- Unique partition indices
- Node distribution
- Retry behavior
- Service job continuous execution

Summary by CodeRabbit

Here are the release notes for this pull request:

New Features

Added environment variable management for job executions
Enhanced support for system and task-level environment variables
Improved job partitioning and execution context handling

Bug Fixes

Fixed potential nil slice access in job task retrieval
Added validation for environment variable naming conventions

Improvements

Streamlined executor and job handling interfaces
Added utility functions for environment variable manipulation
Enhanced test coverage for job execution scenarios

Technical Enhancements

Refactored execution context management
Improved error handling in task and job validation
Added robust environment variable sanitization and merging capabilities

linear · 2025-01-08T16:33:39Z

ENG-519 Populate engines with partition env variables

coderabbitai · 2025-01-08T16:33:46Z

Walkthrough

This pull request introduces a comprehensive enhancement to environment variable management across the Bacalhau system. The changes span multiple packages and introduce new utilities for handling environment variables, including validation, sanitization, and merging. A new function GetExecutionEnvVars is added to generate system and task-level environment variables, with careful consideration of precedence and naming conventions. The modifications touch on execution contexts, Docker container setup, and test scenarios to ensure robust implementation.

Changes

File	Change Summary
`pkg/compute/envvars.go`	New function `GetExecutionEnvVars` to generate execution environment variables
`pkg/executor/types.go`	Added `Env` field to `RunCommandRequest` struct
`pkg/lib/envvar/utils.go`	New utility functions for environment variable manipulation
`pkg/models/constants.go`	Added `EnvVarPrefix` constant for Bacalhau environment variables
`pkg/models/task.go`	Enhanced task validation to check environment variable names
`pkg/executor/docker/*`	Updated Docker executor to handle environment variables

Sequence Diagram

sequenceDiagram
    participant Job
    participant Executor
    participant EnvVarGenerator
    participant DockerContainer

    Job->>EnvVarGenerator: Request environment variables
    EnvVarGenerator-->>Job: Generate system and task env vars
    Job->>Executor: Create RunCommandRequest with Env
    Executor->>DockerContainer: Merge and set environment variables
    DockerContainer->>DockerContainer: Execute task with env vars

Possibly related PRs

Add Partition Support in Job Orchestrator #4800: Introduces partitioned execution, which may require similar environment variable handling for partition indices

Poem

🐰 Env Vars Dancing Free

In Bacalhau's garden of code so bright,
Environment variables take their flight!
Sanitized, merged with utmost care,
From task to system, a variable's flair!

🌈✨ Hop, hop, hooray! 🌈✨

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Generate unit testing code for this file.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai generate unit testing code for this file.
- @coderabbitai modularize this function.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read src/utils.ts and generate unit testing code.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
- @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai generate docstrings to generate docstrings for this PR. (Beta)
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

coderabbitai

Actionable comments posted: 2

🧹 Nitpick comments (11)

pkg/lib/envvar/utils.go (1)
36-46: Expand sanitization to cover more invalid characters

The Sanitize function currently replaces whitespace and = characters with underscores. Consider extending the sanitization to include other invalid characters in environment variable values, such as control characters or shell metacharacters, to enhance security and robustness.

Apply this diff to sanitize additional characters:
 func Sanitize(value string) string {
 	return strings.Map(func(r rune) rune {
-		// replace whitespace and '=' since they can cause problems
-		if unicode.IsSpace(r) || r == '=' {
+		// Replace invalid characters with '_'
+		if !unicode.IsPrint(r) || r == '=' || r == '$' || r == '`' || r == '"' || r == '\'' || unicode.IsSpace(r) {
 			return '_'
 		}
 		return r
 	}, value)
 }
pkg/compute/envvars.go (1)
19-23: Consider additional defensive checks for task-level environment variables.

While the current implementation is correct, consider adding validation for the task environment variables to prevent potential issues with invalid values.
 	if execution.Job != nil && execution.Job.Task() != nil && execution.Job.Task().Env != nil {
+		// Validate and sanitize task environment variables
+		for key, value := range execution.Job.Task().Env {
+			if value != "" {
+				taskEnv[key] = envvar.Sanitize(value)
+			}
+		}
-		taskEnv = execution.Job.Task().Env
 	}
pkg/test/devstack/timeout_test.go (1)

Line range hint 71-79: LGTM! Consider adding documentation.

The refactoring to use execContext improves code organization by consolidating job-related context. This change aligns well with the PR's objective of enhancing job metadata handling.

Consider adding a comment explaining the fields available in execContext for better maintainability.
pkg/executor/docker/models/types.go (1)
42-51: Enhance error message for environment variable validation.

The validation logic correctly prevents naming conflicts with system variables. However, the error message could be more informative by explaining why these prefixes are reserved.

Consider updating the error message to be more descriptive:
-				return fmt.Errorf("invalid docker engine param: environment variable '%s' cannot start with %s",
-					parts[0], models.EnvVarPrefix)
+				return fmt.Errorf("invalid docker engine param: environment variable '%s' cannot start with %s (reserved for system variables)",
+					parts[0], models.EnvVarPrefix)
pkg/executor/docker/models/types_test.go (1)
158-181: Add test case for empty environment variable key.

The test cases comprehensively cover prefix validation. However, consider adding a test case for an empty environment variable key to ensure robust validation.

Add this test case:
+		{
+			name: "Empty Environment Variable Key",
+			engineSpec: EngineSpec{
+				Image:                "valid-image",
+				EnvironmentVariables: []string{"=value"},
+			},
+			expectedErrorMsg: "invalid docker engine param: environment variable key cannot be empty",
+		},
pkg/lib/envvar/utils_test.go (2)
11-61: Add test case for special characters in environment variable keys.

The test cases for ToSlice are comprehensive but could benefit from testing environment variable keys containing special characters to ensure robust handling.

Add this test case:
+		{
+			name: "keys with special characters",
+			env: map[string]string{
+				"SPECIAL.KEY": "value1",
+				"SPECIAL-KEY": "value2",
+				"SPECIAL_KEY": "value3",
+			},
+			want: []string{
+				"SPECIAL.KEY=value1",
+				"SPECIAL-KEY=value2",
+				"SPECIAL_KEY=value3",
+			},
+		},
201-241: Consider testing priority order with multiple overrides.

The MergeSlices tests cover basic scenarios well. Consider adding a test case with multiple priority overrides to ensure correct precedence.

Add this test case:
+		{
+			name:     "multiple priority overrides",
+			base:     []string{"KEY1=BASE1", "KEY2=BASE2", "KEY3=BASE3"},
+			priority: []string{"KEY1=PRIORITY1", "KEY3=PRIORITY3"},
+			want:     []string{"KEY1=PRIORITY1", "KEY2=BASE2", "KEY3=PRIORITY3"},
+		},
pkg/test/devstack/partitioning_test.go (1)

141-191: Consider adding more assertions for retry behavior.

While the test effectively simulates failures and verifies recovery, consider adding assertions to verify:

The node assignment changes between retries

The error messages in the output match the simulated failures

pkg/executor/docker/executor_test.go (1)

623-652: Consider adding edge cases to the test suite.

While the current test cases cover the main scenarios, consider adding tests for:

Empty environment variables

Special characters in variable values

Very long environment variable values

Duplicate variable names with different cases
pkg/executor/types.go (1)
69-69: Enhance documentation for the Env field.

While the documentation is clear, it would be helpful to list the system-defined environment variables that will be available (e.g., BACALHAU_JOB_ID, BACALHAU_PARTITION_INDEX, etc.) to provide better guidance for users implementing executors.
-	Env          map[string]string         // System defined and task level environment variables.
+	Env          map[string]string         // System defined (BACALHAU_*) and task level environment variables.
+	                                       // System variables include: BACALHAU_JOB_ID, BACALHAU_JOB_NAME,
+	                                       // BACALHAU_JOB_NAMESPACE, BACALHAU_JOB_TYPE, BACALHAU_EXECUTION_ID,
+	                                       // BACALHAU_NODE_ID, BACALHAU_PARTITION_INDEX, BACALHAU_PARTITION_COUNT
pkg/test/compute/resourcelimits_test.go (1)
293-293: Utilize the execution context consistently.

While the signature has been updated to use noop_executor.ExecutionContext, the parameter is unused. Consider tracking job IDs consistently across test cases.
-	jobHandler := func(_ context.Context, _ noop_executor.ExecutionContext) (*models.RunCommandResult, error) {
+	jobHandler := func(_ context.Context, execContext noop_executor.ExecutionContext) (*models.RunCommandResult, error) {
 		time.Sleep(time.Millisecond * 1000)
+		// Track job IDs for verification
+		jobIdsMutex.Lock()
+		jobIds = append(jobIds, execContext.JobID)
+		jobIdsMutex.Unlock()
 		seenJobs++
 		return &models.RunCommandResult{}, nil
 	}

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 559c036 and f2ed4bd.

📒 Files selected for processing (21)

pkg/compute/envvars.go (1 hunks)
pkg/compute/envvars_test.go (1 hunks)
pkg/compute/executor.go (1 hunks)
pkg/executor/docker/executor.go (4 hunks)
pkg/executor/docker/executor_test.go (2 hunks)
pkg/executor/docker/models/types.go (1 hunks)
pkg/executor/docker/models/types_test.go (1 hunks)
pkg/executor/noop/executor.go (7 hunks)
pkg/executor/types.go (1 hunks)
pkg/lib/envvar/utils.go (1 hunks)
pkg/lib/envvar/utils_test.go (1 hunks)
pkg/models/constants.go (1 hunks)
pkg/models/job.go (1 hunks)
pkg/models/task.go (3 hunks)
pkg/models/task_test.go (1 hunks)
pkg/test/compute/resourcelimits_test.go (2 hunks)
pkg/test/devstack/errorlogs_test.go (1 hunks)
pkg/test/devstack/partitioning_test.go (1 hunks)
pkg/test/devstack/target_all_test.go (1 hunks)
pkg/test/devstack/timeout_test.go (1 hunks)
pkg/test/scenario/example_noop_test.go (1 hunks)

🔇 Additional comments (25)

pkg/executor/noop/executor.go (1)

22-28: Introduction of ExecutionContext enhances code clarity

The introduction of the ExecutionContext struct consolidates the necessary execution parameters into a single entity. This refactoring simplifies function signatures and improves code readability and maintainability.

pkg/test/scenario/example_noop_test.go (1)

24-25: JobHandler signature update aligns with ExecutionContext refactoring

The JobHandler function now accepts execContext noop.ExecutionContext, and the use of execContext.ResultsDir ensures the correct results directory is utilized. This change maintains consistency with the refactored executor code.

pkg/lib/envvar/utils.go (1)

48-60: Ensure deterministic order when merging environment variables

Maps in Go are unordered, which might lead to non-deterministic ordering when converting back to slices. If the order of environment variables is significant for downstream processes, consider sorting the keys to maintain a consistent order.
pkg/compute/envvars.go (3)

14-18: LGTM! Proper nil check handling.

The function correctly handles the nil case by returning an empty map, preventing potential null pointer dereferences.

25-40: LGTM! Comprehensive system variables setup.

The code properly sets up all required system environment variables with appropriate sanitization where needed. The use of the EnvVarPrefix constant ensures consistency.

42-44: Verify the merge behavior for environment variables.

The code correctly uses envvar.Merge to combine task and system variables, with system variables taking precedence. Let's verify the implementation of the merge function.

✅ Verification successful

Environment variable merge behavior verified correctly

The envvar.Merge implementation in pkg/lib/envvar/utils.go correctly ensures that system variables take precedence over task variables by using maps.Clone and maps.Copy operations, matching the intended behavior.
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:
#!/bin/bash
# Description: Verify the implementation of envvar.Merge function
# Expected: Find the implementation to confirm precedence handling

ast-grep --pattern 'func Merge($_, $_) map[string]string {
  $$$
}'
Length of output: 69

Script:
#!/bin/bash
# Search for files containing the Merge function definition
rg -l "func Merge" 

# Also search with package context to narrow down results
rg "package.*\n.*func Merge" -A 5
Length of output: 296

Script:
#!/bin/bash
# View the implementation of Merge function
cat pkg/lib/envvar/utils.go
Length of output: 1599
pkg/models/constants.go (1)

13-15: LGTM! Well-placed constant definition.

The EnvVarPrefix constant is appropriately defined and positioned within the constants block. The prefix "BACALHAU_" follows the common practice of using uppercase letters and underscores for environment variable prefixes.

pkg/compute/envvars_test.go (2)

13-132: LGTM! Comprehensive test coverage.

The test suite effectively covers various scenarios including:

Edge cases (nil execution)

Basic functionality

Environment variable precedence

Special character handling

System vs. task-level variable conflicts

79-83: Good test for environment variable precedence.

The test explicitly verifies that system environment variables take precedence over task-level variables by attempting to override BACALHAU_NODE_ID.

pkg/test/devstack/target_all_test.go (1)

129-133: LGTM! Clean refactoring to use ExecutionContext.

The change properly updates the JobHandler to use the structured ExecutionContext instead of individual parameters, improving code organization.

pkg/lib/envvar/utils_test.go (1)

108-148: LGTM! Comprehensive sanitization tests.

The test cases for Sanitize thoroughly cover various scenarios including spaces, equals signs, and edge cases.

pkg/models/task.go (3)

128-133: LGTM! Environment variable validation is well-implemented.

The validation correctly prevents users from setting environment variables that could conflict with system-provided variables, using case-insensitive comparison.

165-183: LGTM! Input source validation is thorough.

The validation ensures uniqueness of both aliases and targets, preventing potential conflicts in the input sources.

185-202: LGTM! Result path validation is comprehensive.

The validation ensures uniqueness of both names and paths, preventing potential conflicts in the result paths.

pkg/test/devstack/partitioning_test.go (2)

37-80: LGTM! Single partition test is well-structured.

The test thoroughly validates the presence and correctness of all environment variables, including job metadata and partition information.

82-139: LGTM! Multiple partitions test is comprehensive.

The test effectively verifies:

Unique partition indices across executions

Distribution of partitions across nodes

Correct partition count reporting

pkg/models/task_test.go (1)

192-227: LGTM! Environment variable validation tests are thorough.

The test cases effectively cover:

Uppercase prefix validation

Lowercase prefix validation (case-insensitive)

Valid environment variables, including edge cases

pkg/models/job.go (1)

334-336: LGTM! Added safety check for empty Tasks slice.

The change prevents potential panic by checking for empty Tasks slice before accessing the first element.
pkg/compute/executor.go (1)

186-186: Verify environment variables precedence and security implications.

The addition of execution environment variables is a significant change that affects the execution context. Ensure that:

System-level environment variables take precedence over user-provided ones

Sensitive information is not leaked through environment variables

Environment variable names follow the established naming conventions

Run the following script to verify environment variable handling:

✅ Verification successful

Environment variable handling implementation verified

The implementation in GetExecutionEnvVars correctly handles environment variables with proper security measures:

System-level variables take precedence over user-provided ones

Environment variables are sanitized and validated

The BACALHAU_ prefix is protected to prevent injection of system variables
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:
#!/bin/bash
# Description: Check for potential security issues in environment variable handling
# Test: Look for direct exposure of sensitive data in environment variables

# Check for potential sensitive data in environment variable names/patterns
rg -i '(?:password|secret|key|token|credential).*=.*[^\s]' 

# Check for environment variable validation
ast-grep --pattern 'GetExecutionEnvVars($_)'
Length of output: 75844

Script:
#!/bin/bash
# Find GetExecutionEnvVars implementation and related code
ast-grep --pattern 'func GetExecutionEnvVars'

# Search for the function call and surrounding context
rg -A 10 -B 10 'GetExecutionEnvVars'

# Look for environment variable validation patterns
rg -i 'env.*var.*valid|valid.*env.*var'

# Check for environment variable security patterns
rg -i '(sanitize|clean|secure|validate).*env'
Length of output: 6367
pkg/executor/docker/executor.go (3)

309-315: LGTM! Clean method signature update.

The change simplifies the method signature by using executor.RunCommandRequest directly, improving code maintainability.

325-325: LGTM! Correct container environment configuration.

The merged environment variables are properly assigned to the container configuration.

316-321: Verify the order of environment variable merging.

The merging of environment variables appears correct, but we should ensure that the order of precedence is maintained (engine-level variables should override task-level variables).

Run the following script to verify the environment variable merging logic:

✅ Verification successful

Environment variable merging order is correct

The implementation ensures engine-level variables correctly override task-level variables, as verified by both unit tests and integration tests. The current implementation maintains the desired precedence order.
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:
#!/bin/bash
# Description: Verify environment variable merging implementation
# Test: Look for other merge operations to ensure consistency

# Check for other environment variable merging patterns
ast-grep --pattern 'envvar.MergeSlices($_, $_)'

# Look for potential conflicting merge operations
rg 'MergeSlices|merge.*env'
Length of output: 1393

Script:
#!/bin/bash
# Check the implementation of MergeSlices and its tests
cat pkg/lib/envvar/utils.go
cat pkg/lib/envvar/utils_test.go
cat pkg/executor/docker/executor_test.go
Length of output: 27436
pkg/executor/docker/executor_test.go (1)

582-621: Well-structured test cases for environment variables.

The test cases comprehensively cover:

Task-level environment variables

Engine-level environment variables

Merged environment variables with correct precedence

pkg/test/devstack/errorlogs_test.go (1)

72-73: LGTM! Improved function signature with ExecutionContext.

The change consolidates execution-related parameters into a single context object, improving code organization and maintainability.

pkg/test/compute/resourcelimits_test.go (1)

101-107: LGTM! Good use of execution context.

The change correctly utilizes the execution context to access job metadata, which aligns with the PR objectives.

pkg/executor/noop/executor.go

Execution and Partition-Level Environment Variables

f2ed4bd

wdbaruni self-assigned this Jan 8, 2025

coderabbitai bot reviewed Jan 8, 2025

View reviewed changes

pkg/executor/noop/executor.go Show resolved Hide resolved

pkg/executor/noop/executor.go Show resolved Hide resolved

wdbaruni merged commit 3acac95 into main Jan 8, 2025
15 checks passed

wdbaruni deleted the eng-519-populate-engines-with-partition-env-variables branch January 8, 2025 17:01

This was referenced Jan 9, 2025

Expose execution to storage providers #4803

Merged

Remove kubectl template dependency with simplified implementation #4807

Merged

Support Local License Inspection Command #4812

Merged

This was referenced Jan 21, 2025

introducing new metrics recorder #4815

Merged

add scheduler metrics #4819

Merged

rate limit number of new executions per evaluation #4831

Merged

coderabbitai bot mentioned this pull request Feb 19, 2025

Enable port mapping and network configuration for jobs #4855

Merged

wdbaruni mentioned this pull request Mar 3, 2025

Give information about the node it's being run on to the running job #4796

Closed

coderabbitai bot mentioned this pull request Apr 7, 2025

[Docker executor] Stream execution logs from local file #4916

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Execution and Partition-Level Environment Variables #4801

Execution and Partition-Level Environment Variables #4801

Uh oh!

wdbaruni commented Jan 8, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

linear bot commented Jan 8, 2025

Uh oh!

coderabbitai bot commented Jan 8, 2025 •

edited

Loading

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (`.coderabbit.yaml`)

Documentation and Community

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Execution and Partition-Level Environment Variables #4801

Execution and Partition-Level Environment Variables #4801

Uh oh!

Conversation

wdbaruni commented Jan 8, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

1. Enhanced Execution Environment Variables

2. Test Suite for Partition Scheduling

Summary by CodeRabbit

Uh oh!

linear bot commented Jan 8, 2025

Uh oh!

coderabbitai bot commented Jan 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram

Possibly related PRs

Poem

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (.coderabbit.yaml)

Documentation and Community

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

wdbaruni commented Jan 8, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Jan 8, 2025 •

edited

Loading

CodeRabbit Configuration File (`.coderabbit.yaml`)