Skip to content

feat: Add support for S3 as an asset storage layer #1703

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Jul 4, 2025
Merged

Conversation

MohamedBassem
Copy link
Collaborator

Fixes #305

Copy link

cloudflare-workers-and-pages bot commented Jul 4, 2025

Deploying karakeep-landing with  Cloudflare Pages  Cloudflare Pages

Latest commit: fa27add
Status: ✅  Deploy successful!
Preview URL: https://5e837fad.karakeep-landing.pages.dev
Branch Preview URL: https://feat-s3.karakeep-landing.pages.dev

View logs

@MohamedBassem MohamedBassem requested a review from Copilot July 4, 2025 21:55
Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Adds support for using S3-compatible object storage as an alternative asset backend.

  • Introduces a new AssetStore interface with both local filesystem and S3 implementations.
  • Updates configuration schema, dependencies, docs, and Docker Compose to enable MinIO/S3 testing.
  • Refactors API utilities and adds end-to-end tests for S3 and filesystem stores.

Reviewed Changes

Copilot reviewed 13 out of 14 changed files in this pull request and generated no comments.

Show a summary per file
File Description
packages/shared/package.json Added @aws-sdk/client-s3 dependency
packages/shared/config.ts Defined new S3‐related environment variables
packages/shared/assetdb.ts Refactored asset logic into AssetStore interface with LocalFileSystemAssetStore and S3AssetStore
packages/api/utils/upload.ts Broadened stream type to NodeJS.ReadableStream
packages/api/utils/assets.ts Added await to createAssetReadStream calls
packages/e2e_tests/docker-compose.yml Added MinIO service for S3 testing
docs/docs/03-configuration.md Documented new S3 storage environment variables
packages/e2e_tests/package.json Added S3 SDK dependency and no-build test flag
Files not reviewed (1)
  • pnpm-lock.yaml: Language not supported
Comments suppressed due to low confidence (6)

packages/shared/assetdb.ts:113

  • Missing import for crypto. Add import crypto from "crypto"; at the top to avoid runtime errors.
  return crypto.randomUUID();

packages/shared/assetdb.ts:545

  • serverConfig is not imported in this file, causing a reference error. Import it from config.ts (e.g., import { serverConfig } from "./config";).
  const config = serverConfig.assetStore;

packages/shared/config.ts:89

  • Typo in comment: use "separate" instead of "separete".
  // A flag to detect if the user is running in the old separete containers setup

packages/shared/assetdb.ts:270

  • The leading slash in the glob pattern may prevent matching relative paths under rootPath. Use **/**/asset.bin instead.
    const g = new Glob(`/**/**/asset.bin`, {

packages/shared/assetdb.ts:123

  • User‐provided userId and assetId should be validated or sanitized to prevent path traversal (e.g., disallow .. segments).
  private getAssetDir(userId: string, assetId: string) {

packages/shared/assetdb.ts:472

  • Deleting objects one-by-one can be slow for many files. Consider using S3's DeleteObjectsCommand to batch up to 1000 keys per request.
        const deletePromises = listResponse.Contents.map((obj: _Object) =>

@MohamedBassem MohamedBassem merged commit d66b3b8 into main Jul 4, 2025
9 checks passed
@MohamedBassem MohamedBassem deleted the feat/s3 branch July 10, 2025 08:43
@maelp
Copy link
Contributor

maelp commented Jul 21, 2025

Will there be a one-off command to automatically migrate existing assets to the new S3 folder?

@MohamedBassem
Copy link
Collaborator Author

@maelp I'm not planning to currently implement one, but an AI agent should be able to oneshot a script by just reading the assetDb.ts file.

@maelp
Copy link
Contributor

maelp commented Jul 22, 2025 via email

@pew
Copy link

pew commented Jul 26, 2025

@maelp I'm not planning to currently implement one, but an AI agent should be able to oneshot a script by just reading the assetDb.ts file.

gave it a try, worked just fine. Moved the local assets folder, restarted Karakeep and it is fetching all data from s3:

for dir in /data/karakeep-data/assets/*/*; do
    userId=$(basename "$(dirname "$dir")")
    assetId=$(basename "$dir")

    ctype=$(jq -r '.contentType' "$dir/metadata.json")
    fname=$(jq -r '.fileName // empty' "$dir/metadata.json")

    aws s3 cp "$dir/asset.bin" "s3://<bucket-name>/$userId/$assetId" \
        --content-type "$ctype" \
        $( [ -n "$fname" ] && printf -- '--metadata x-amz-meta-file-name=%s' "$fname" )
done

@MohamedBassem
Copy link
Collaborator Author

@pew Thanks a lot for writing the script, it looks about right to me. I think however, we're also missing x-amz-meta-content-type header which karakeep might potentially rely on (if not now, in the future).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Feature Request] Support S3 Storage
3 participants