Skip to content

Conversation

un-def
Copy link
Collaborator

@un-def un-def commented Jun 26, 2025

Each item in the files property maps a local path (a file or a dir) to a container path. Each local path is packed into a tar archive and uploaded to the server similar to the code repo archive/diff.

On the server, each archive is stored in the DB as FileArchiveModel linked to the user. Archive blobs are optionally uploaded to the storage (again, similar to the code blob).

When the job is submitted to the runner, files (if any) are uploaded after /api/submit but before /api/upload_code.

The runner unpacks archives as follows:

  • If the path already exists, it's removed
  • If any parent dirs of the path are missing, they are created as owned by the run user. The owner of the existing dirs is not changed.
  • The owner of the path (and all subpaths in the case of the directory) is set to the run user. The permissions from the archive (and thus from the user's machine) are preserved.

Part-of: #2738

Each item in the `files` property maps a local path (a file or a dir)
to a container path. Each local path is packed into a tar archive and
uploaded to the server similar to the code repo archive/diff.

On the server, each archive is stored in the DB as `FileArchiveModel`
linked to the user. Archive blobs are optionally uploaded to
the storage (again, similar to the code blob).

When the job is submitted to the runner, files (if any) are uploaded
after `/api/submit` but before `/api/upload_code`.

The runner unpacks archives as follows:
* If the path already exists, it's removed
* If any parent dirs of the path are missing, they are created as owned
  by the run user. The owner of the existing dirs is not changed.
* The owner of the path (and all subpaths in the case of the directory)
  is set to the run user. The permissions from the archive (and thus
  from the user's machine) are preserved.

Part-of: #2738
@un-def un-def requested review from r4victor and jvstme June 26, 2025 17:17


class FilesAPIClient(APIClientGroup):
def get_archive_by_hash(self, hash: str) -> FileArchive:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any plans to use this function?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dstack itself does not use this API method at the moment, but it can be used to avoid uploading a large archive if it's already uploaded (e.g., in RunCollection.apply_plan):

  1. Create an archive, get a hash
  2. Check if the archive is already uploaded, calling this method
  3. If the archive is already uploaded, reuse id returned by the method, upload the archive otherwise

We could improve RunCollection.apply_plan by adding this additional check for large archives (for small ones the additional API call only increases execution time).

@un-def un-def merged commit c5dd99d into master Jun 27, 2025
25 checks passed
@un-def un-def deleted the issue_2738_run_files branch June 27, 2025 08:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants