Skip to content

Data loss with --replay and hard-linked files due to missing checksums. #672

@misieck

Description

@misieck

Under certain conditions, rmlint does not calculate checksums of encountered files. All entries without checksums are seemingly interpreted by --replay as one and the same file, irrelevant of their content. This results in deletion of all but one of such files.
One such condition that I found is when encountering hard links. Consider the following test setup:

\testdir
    - file_one 
    - file_one_link
    - file_two 
    - file_two_link

Sizes of file_one and file_two must differ. Running rmlint produces a rmlint.json file with no checksums in it, but the actions are correct.

# Duplicate(s):
    ls '/home/muser/rmlint/test/two'
    rm '/home/muser/rmlint/test/two_link'
    ls '/home/muser/rmlint/test/one'
    rm '/home/muser/rmlint/test/one_link'

Running rmlint --replay rmlint.json afterwards produces a script attempting to delete all but one of the listed files:

# Duplicate(s):
    ls '/home/muser/rmlint/test/one'
    rm '/home/muser/rmlint/test/one_link'
    rm '/home/muser/rmlint/test/two'
    rm '/home/muser/rmlint/test/two_link'

This would delete both two and two_link.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions