-
Notifications
You must be signed in to change notification settings - Fork 99
support plain encoding in s3 publisher #4840
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
WalkthroughThe changes refactor the S3 publisher configuration and publishing process. The extraction of S3 options is streamlined by replacing individual variable handling with a unified parameters map. Test cases are updated to use lowercase key names and include an encoding field. In the publisher, the Changes
Sequence Diagram(s)sequenceDiagram
participant Client
participant Publisher
participant S3Service
Client->>Publisher: Call PublishResult(spec)
alt Encoding is "plain"
Publisher->>Publisher: Invoke publishDirectory()
Publisher->>S3Service: Upload files individually
else Encoding is "gzip"
Publisher->>Publisher: Invoke publishArchive()
Publisher->>S3Service: Upload GZIP archive
end
S3Service-->>Publisher: Return upload status
Publisher-->>Client: Return SpecConfig with results
Suggested reviewers
Poem
Tip 🌐 Web search-backed reviews and chat
📜 Recent review detailsConfiguration used: CodeRabbit UI 📒 Files selected for processing (1)
🔇 Additional comments (2)
✨ Finishing Touches
🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (3)
pkg/publisher/s3/publisher.go (2)
65-67
: Defaulting to archive encoding is reasonable.
Consider logging at debug or trace level if no encoding is specified, to help with troubleshooting.
129-192
: Directory publishing is implemented well.
To enhance performance on large directories, consider supporting concurrent uploads. If partial success is acceptable, you may want to continue walking after individual file upload failures.pkg/s3/types.go (1)
97-106
: LGTM! Consider adding documentation.The
Encoding
type and its validation are well-designed. Consider adding documentation to describe the purpose and usage of each encoding type.+// Encoding represents the type of encoding used for S3 objects type Encoding string const ( + // EncodingGzip represents gzip compression encoding EncodingGzip Encoding = "gzip" + // EncodingPlain represents no compression encoding EncodingPlain Encoding = "plain" ) +// IsValid checks if the encoding is one of the supported types func (e Encoding) IsValid() bool { return e == EncodingGzip || e == EncodingPlain }
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (7)
cmd/util/opts/publisher_specconfig.go
(1 hunks)cmd/util/opts/publisher_specconfig_test.go
(3 hunks)pkg/publisher/s3/publisher.go
(3 hunks)pkg/publisher/s3/publisher_test.go
(2 hunks)pkg/s3/test/suite.go
(2 hunks)pkg/s3/types.go
(2 hunks)pkg/s3/types_test.go
(6 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (8)
- GitHub Check: build / Build Binary (windows, amd64)
- GitHub Check: build / Build Binary (darwin, arm64)
- GitHub Check: build / Build Binary (darwin, amd64)
- GitHub Check: build / Build Binary (linux, armv6)
- GitHub Check: build / Build Binary (linux, armv7)
- GitHub Check: build / Build Binary (linux, arm64)
- GitHub Check: build / Build Binary (linux, amd64)
- GitHub Check: lint / lint
🔇 Additional comments (13)
pkg/publisher/s3/publisher.go (3)
7-7
: Good use offilepath
import for directory traversal.
No issues spotted here.
61-63
: Explicit handling for plain encoding.
This branching approach is clear and avoids confusion with other encodings. Ensure you have test coverage for unknown encodings.
[approve]
69-74
: NewpublishArchive
method.
This function is concise and well-structured for creating and uploading a compressed archive. Good job handling the temporary file cleanup.cmd/util/opts/publisher_specconfig_test.go (4)
35-36
: Lowercase keys for S3 parameters.
Aligns with the new approach of consistently lowercasing S3 keys in the spec. Looks good!
46-50
: Maintaining lowercase key consistency with S3.
This ensures a uniform parameter format throughout the codebase.
53-64
: Test coverage for newencoding
parameter.
Great addition to confirm correctness of plain encoding handling.
71-73
: Final set of lowercase keys in S3 config.
All tested variants are consistent now. No concerns here.cmd/util/opts/publisher_specconfig.go (1)
103-119
: Consolidated S3 option handling inparams
map.
This simplifies the logic and preserves flexibility for new parameters (likeencoding
). If needed, consider validating non-empty values forbucket
andkey
to avoid misconfigurations.pkg/s3/types_test.go (2)
29-29
: LGTM! Good test coverage for encoding field.The test cases thoroughly validate both plain and gzip encoding with proper case sensitivity handling.
Also applies to: 45-45, 52-52, 68-68, 75-75, 91-91
143-153
: LGTM! Good error handling test coverage.The test case properly validates error handling for invalid encoding values.
pkg/s3/types.go (1)
108-126
: LGTM! Clean integration of encoding field.The encoding field is well-integrated into the
PublisherSpec
struct with proper validation.pkg/publisher/s3/publisher_test.go (1)
154-218
: LGTM! Comprehensive test coverage.The test cases thoroughly validate:
- Both plain and gzip encoding
- Various path patterns and edge cases
- Proper handling of checksums and version IDs based on encoding
pkg/s3/test/suite.go (1)
149-163
: LGTM! Clean refactor to support encoding.The helper methods are well-updated to support the new encoding type with clear path generation logic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
pkg/downloader/s3signed/downloader_test.go (1)
82-99
: Good test coverage, but could be more descriptive.The test effectively verifies that uncompressed downloads are not supported with pre-signed URLs, which aligns with the documented limitations. However, consider these improvements:
- The comment on line 90 should be updated as we're testing an unsigned URL.
- Add an error message assertion to verify the specific failure reason.
- // download signed url + // attempt to download unsigned url downloadParentPath, err := os.MkdirTemp(s.TempDir, "") s.Require().NoError(err) _, err = s.downloader.FetchResult(s.Ctx, downloader.DownloadItem{ Result: &storageSpec, ParentPath: downloadParentPath, }) - s.Require().Error(err) + s.Require().Error(err) + s.Contains(err.Error(), "unsupported storage source type", "should fail with unsupported source type error")
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
pkg/downloader/s3signed/downloader_test.go
(2 hunks)
🔇 Additional comments (1)
pkg/downloader/s3signed/downloader_test.go (1)
56-80
: LGTM! Clean transition to the new encoding type.The test has been updated to use the new
EncodingGzip
constant while maintaining the same robust verification flow.
Add plain encoding option to S3 Publisher
Description
This PR adds support for publishing job results without gzip compression through a new
Encoding
type. This enables more efficient data pipelines where subsequent jobs need to access individual files from previous job results.Previously, all job results were automatically gzip compressed before uploading to S3. While this is efficient for storage and download, it requires downloading and decompressing the entire archive to access any file. This can be inefficient for workflows like map-reduce where jobs only need specific files from previous results.
Changes
Encoding
type withEncodingGzip
(default) andEncodingPlain
optionsBenefits and Tradeoffs
Benefits
Tradeoffs
bacalhau job get
with pre-signed URLs when using plain encoding (requires individual file URLs)Usage Recommendations
bacalhau job get
functionalitySummary by CodeRabbit
New Features
Bug Fixes
Documentation