That’s actually not as trivial as it seems, because you need a canonical representation of that directory to generate such a thing in the same way on each side.
You need to encode the metadata in a standard way, encode new data that shows up in a standard way, and various people can add more metadata to files: think like Posix ACLs or the immutable flag or whatever.
Then there is maybe some metadata that you probably want to exclude, like atime (though not if you’re something like rsync -U!), and some metadata that you almost-certainly want to exclude, like inode number.
The OS’s file APIs won’t have a defined order in which they return entries in a directory. Like, they’ll normally just return it in whatever order things come back from the filesystem, which is probably whatever is most-efficient for the filesystem in question, given how things are encoded on disk. If you sort the directory entries, then it can’t be — as is the case for most things on the system — done in a locale-dependent fashion. Utilities like tar don’t impose a canonical ordering, so you can’t just dump the problems on tar by checksumming a tarball of the directory.
EDIT: tardoes appear to have a canonical ordering option today, though it also probably doesn’t have the constraint of being backwards-compatible with metadata included, another thing that one would need for such a checksum if one were to leverage tar.
You need to encode the metadata in a standard way, encode new data that shows up in a standard way, and various people can add more metadata to files: think like Posix ACLs or the immutable flag or whatever.
Nix actually invented a fork of tar specifically for this called “normalized archive” or “Nix Archive” or nar. Guix uses this too:
That’s actually not as trivial as it seems, because you need a canonical representation of that directory to generate such a thing in the same way on each side.
You need to encode the metadata in a standard way, encode new data that shows up in a standard way, and various people can add more metadata to files: think like Posix ACLs or the immutable flag or whatever.
Then there is maybe some metadata that you probably want to exclude, like atime (though not if you’re something like
rsync -U
!), and some metadata that you almost-certainly want to exclude, like inode number.The OS’s file APIs won’t have a defined order in which they return entries in a directory. Like, they’ll normally just return it in whatever order things come back from the filesystem, which is probably whatever is most-efficient for the filesystem in question, given how things are encoded on disk. If you sort the directory entries, then it can’t be — as is the case for most things on the system — done in a locale-dependent fashion. Utilities like
tar
don’t impose a canonical ordering, so you can’t just dump the problems ontar
by checksumming a tarball of the directory.EDIT:
tar
does appear to have a canonical ordering option today, though it also probably doesn’t have the constraint of being backwards-compatible with metadata included, another thing that one would need for such a checksum if one were to leveragetar
.Given it was a joke, I don’t think you need to do anything…
Sorting is a thing :)
Nix actually invented a fork of
tar
specifically for this called “normalized archive” or “Nix Archive” ornar
. Guix uses this too:https://releases.nixos.org/nix/nix-2.22.0/manual/protocols/nix-archive.html