duperemove
linux
Finds duplicate filesystem extents and optionally schedule them for deduplication. An extent is small part of a file inside the filesystem. On some filesystems one extent can be referenced multiple times, when parts of the content of the files are identical.
More info →Examples (4)
Search for duplicate extents in a directory and show them
duperemove -r path/to/directoryDeduplicate duplicate extents on a Btrfs or XFS (experimental) filesystem
duperemove -r -d path/to/directoryUse a hash file to store extent hashes (less memory usage and can be reused on subsequent runs)
duperemove -r -d --hashfile=path/to/hashfile path/to/directoryLimit I/O threads (for hashing and dedupe stage) and CPU threads (for duplicate extent finding stage)
duperemove -r -d --hashfile=path/to/hashfile --io-threads=n --cpu-threads=n path/to/directorymade by @shridhargupta | data from tldr-pages