Skip to content

Conversation

giuseppe
Copy link
Member

offer a way to deduplicate files in the local storage using reflinks when possible.

For now, it deduplicates only identical files, even if reflinks support deduplication only of a block aligned portion of a file

Copy link
Contributor

openshift-ci bot commented Nov 20, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: giuseppe

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@giuseppe
Copy link
Member Author

cleaned up some code I was using just for local testing.

@rhatdan @nalind PTAL

@giuseppe giuseppe force-pushed the dedup-operation branch 4 times, most recently from 34a423c to f9b8af2 Compare November 20, 2024 16:09
@giuseppe giuseppe added the jira label Nov 20, 2024
@giuseppe giuseppe marked this pull request as draft November 20, 2024 19:41
@giuseppe giuseppe marked this pull request as ready for review November 20, 2024 20:10
@giuseppe giuseppe marked this pull request as draft November 20, 2024 20:12
@giuseppe giuseppe marked this pull request as ready for review November 20, 2024 20:26
@giuseppe giuseppe force-pushed the dedup-operation branch 3 times, most recently from 1aba6ba to ecf545a Compare November 21, 2024 12:36
@giuseppe giuseppe marked this pull request as draft November 21, 2024 14:38
Copy link
Collaborator

@mtrmac mtrmac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only a brief drive-by skim, I didn’t read the PR carefully I’m afraid.

@giuseppe giuseppe force-pushed the dedup-operation branch 2 times, most recently from 2d59a54 to bb6967b Compare November 21, 2024 22:22
@giuseppe
Copy link
Member Author

giuseppe commented Dec 3, 2024

  • Move pkg/dedup to internal/dedup (probably requires relocating/restructuring HashMethod).

what do you think of the new version?

It adds some duplication in the store.go file, but perhaps that is acceptable?

Alternatively, we have types/ and we could move these common types there. From what I can see though, the current types deal mostly with the storage configuration file

@giuseppe giuseppe force-pushed the dedup-operation branch 2 times, most recently from 441a354 to 8f831dc Compare December 3, 2024 15:06
@mtrmac
Copy link
Collaborator

mtrmac commented Dec 3, 2024

  • Move pkg/dedup to internal/dedup (probably requires relocating/restructuring HashMethod).

what do you think of the new version?

Great, works for me.

It adds some duplication in the store.go file, but perhaps that is acceptable?

Alternatively, we have types/ and we could move these common types there. From what I can see though, the current types deal mostly with the storage configuration file

(I’m not that worried about these simple types with no dependencies and no runtime cost; we can always add aliases from other packages to make the API more convenient — as long as the types/fields make sense to commit to.) Here the caller is certainly going to import the top-level storage package, so I think having them present directly in storage is the most convenient option for users.

Copy link
Collaborator

@mtrmac mtrmac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’d prefer addressing also at least #2176 (comment) and #2176 (comment) , while we still remember this context.

@giuseppe
Copy link
Member Author

giuseppe commented Dec 6, 2024

documented that the dedup result accounts for previous runs, and changed the code to deal with multiple file systems

@giuseppe giuseppe force-pushed the dedup-operation branch 2 times, most recently from c1952d2 to f895bde Compare December 6, 2024 09:40
@giuseppe
Copy link
Member Author

giuseppe commented Dec 9, 2024

@containers/podman-maintainers PTAL

Copy link
Member

@Luap99 Luap99 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not really a full review as I am not familiar with the code base and intentions here but error handling could need some work I believe

offers a way to iterate a list of directories and find duplicate
files.  The duplicate files are deduplicated using reflinks where
supported.

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
expose a new API to deduplicate files in the images.

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
introduce a new `dedup` command to the `containers-storage` tool to
deduplicate similar files in image layers.  Reflinks support from the
underlying file system is needed.

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
@giuseppe giuseppe marked this pull request as draft December 9, 2024 15:44
@giuseppe giuseppe marked this pull request as ready for review December 9, 2024 15:45
@mtrmac
Copy link
Collaborator

mtrmac commented Dec 9, 2024

documented that the dedup result accounts for previous runs, and changed the code to deal with multiple file systems

ACK

Copy link
Member

@Luap99 Luap99 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM for the error parts I looked at least. For the actual storage details I defer to others

@mtrmac
Copy link
Collaborator

mtrmac commented Dec 10, 2024

/lgtm

@openshift-ci openshift-ci bot added the lgtm label Dec 10, 2024
@openshift-merge-bot openshift-merge-bot bot merged commit c27c473 into containers:main Dec 10, 2024
20 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants