Skip to content

skip_hardlink not generated where expected. Already hardlinked files have "cp_hardlink" generated - even if seen as dupes #545

@james-cook

Description

@james-cook

I have already run rmlint over some directories on a drive.
Now I am running rmlint on the whole drive to see what remains to be linted.
I'm running with "sh:hardlink".

Strangely already hardlinked files (same inode) have "cp_hardlink" instead of "skip_hardlink" generated in rmlint.sh.
cp_hardlink itself does no sanity check (check to see whether inodes already the same) and does a "rm" and "ln" all over again.

For example: These files share the same inode:

pi@rpiomv: $ ls -ali '/tmp/temp_E_4TB/HSa.p1/2008.01.13.hs.p-004-01.avi'
895822 -rwxrwxrwx 2 root root 1070006272 Aug 18  2010 /tmp/temp_E_4TB/HSa.p1/2008.01.13.hs.p-004-01.avi
pi@rpiomv: $ xattr -l '/tmp/temp_E_4TB/HSa.p1/2008.01.13.hs.p-004-01.avi'
user.rmlint.blake2b.cksum:
0000   36 66 64 62 39 34 30 33 33 61 35 62 32 33 61 35    6fdb94033a5b23a5
0010   33 38 34 64 37 62 39 64 34 65 34 65 63 64 38 39    384d7b9d4e4ecd89
0020   32 63 65 30 63 31 30 36 38 61 30 62 62 32 35 65    2ce0c1068a0bb25e
0030   32 32 63 34 32 65 65 63 63 35 32 35 62 33 30 62    22c42eecc525b30b
0040   66 33 36 35 63 61 31 64 32 36 30 33 61 36 31 65    f365ca1d2603a61e
0050   64 63 65 30 34 64 38 38 37 34 61 65 38 64 30 64    dce04d8874ae8d0d
0060   37 32 39 35 33 36 35 34 30 33 63 39 38 39 61 39    7295365403c989a9
0070   33 61 33 61 34 62 66 37 31 64 34 37 62 39 38 63    3a3a4bf71d47b98c
0080   00                                                 .

user.rmlint.blake2b.mtime: 1282169807.7567737
pi@rpiomv: $ ls -ali '/tmp/temp_E_4TB/HSa.p2/2008.01.13.hs.p-004-01.avi'
809730 -rwxrwxrwx 2 root root 1070006272 Aug 18  2010 /tmp/temp_E_4TB/HSa.p2/2008.01.13.hs.p-004-01.avi
pi@rpiomv: $ xattr -l '/tmp/temp_E_4TB/HSa.p2/2008.01.13.hs.p-004-01.avi'
user.rmlint.blake2b.cksum:
0000   36 66 64 62 39 34 30 33 33 61 35 62 32 33 61 35    6fdb94033a5b23a5
0010   33 38 34 64 37 62 39 64 34 65 34 65 63 64 38 39    384d7b9d4e4ecd89
0020   32 63 65 30 63 31 30 36 38 61 30 62 62 32 35 65    2ce0c1068a0bb25e
0030   32 32 63 34 32 65 65 63 63 35 32 35 62 33 30 62    22c42eecc525b30b
0040   66 33 36 35 63 61 31 64 32 36 30 33 61 36 31 65    f365ca1d2603a61e
0050   64 63 65 30 34 64 38 38 37 34 61 65 38 64 30 64    dce04d8874ae8d0d
0060   37 32 39 35 33 36 35 34 30 33 63 39 38 39 61 39    7295365403c989a9
0070   33 61 33 61 34 62 66 37 31 64 34 37 62 39 38 63    3a3a4bf71d47b98c
0080   00                                                 .

user.rmlint.blake2b.mtime: 1282169807.7567737

I run the command:

rmlint -c sh:hardlink -T "all -emptyfiles -emptydirs" --progress -S dma -s 1G-1TB --xattr '/tmp/temp_E_4TB/HSa.p1' '/tmp/temp_E_4TB/HSa.p2'

This actually reports 69 dupes but correctly notices that these dupes have zero size (because they are hardlinks of each other):

==> In total 273 files, whereof 69 are duplicates in 69 groups.
==> This equals 0 B of duplicates which could be removed.
==> Scanning took in total 20.342s.

But the generated rmlint.sh file contains:

 :
    461
    462 original_cmd  '/tmp/temp_E_4TB/HSa.p2/2008.01.13.hs.p-004-01.avi' # original
    463 cp_hardlink   '/tmp/temp_E_4TB/HSa.p1/2008.01.13.hs.p-004-01.avi' '/tmp/temp_E_4TB/HSa.p2/2008.01.13.hs.p-004-01.avi' # duplicate
    464
 :

I'm not sure if this is potentially a source of problems down the line but it does seems strange.
Any ideas why skip_hardlink is not generated?
Or, in fact, why ANY cp_hardlink lines are generated for these 69 files in the first place.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions