Data Model
This describes the data model used in snix-castore
to describe file system
trees. blob / chunk storage is covered by other documents.
For those familiar, snix-castore
uses a similar concept as git tree objects,
which also is a merkle structure. 1
Node
snix-castore
can represent three different types.
Nodes themselves don’t have names, names are given by being in a
Directory
structure.
Node::File
A (regular) file. We store the BLAKE3 digest of the raw file contents, the length of the raw data, and an executable bit.
Node::Symlink
A symbolic link. We store the symlink target contents.
Node::Directory
A (child) directory. We store the digest of the Directory structure describing its “contents”.
We also store a size
field, containing the (total) number of all child
elements in the referenced Directory
, which helps for inode calculation.
Directory
The Directory struct contains all nodes in a single directory (on that level), alongside with their (base)names (called PathComponent ).
.
and ..
are not included.
For the Directory struct, a Digest can be calculated2, which
is what the parent Node::Directory
will use as a reference, to build a merkle
structure.
PathComponent
This is a more strict version of bytes, reduced to valid path components in a Directory .
It disallows slashes, null bytes, .
, ..
and the
empty string. It also rejects too long names (> 255 bytes).
Merkle DAG
The pointers from Node::File
to Directory
, and this one potentially
containing Node::File
again makes the whole structure a merkle tree (or
strictly speaking, a graph, as two elements pointing to a child directory with
the same contents would point to the same Directory
message).
Protobuf
In addition to the Rust types described above, there’s also a protobuf representation, which differs slightly:
Instead of nodes being unnamed, and Directory
containing a map from
PathComponent
to Node
(and keys being the basenames in that directory),
the Directory
message contains three lists, directories
, files
and
symlinks
, holding DirectoryEntry
, FileEntry
and SymlinkEntry
messages
respectively.
These contain all fields present in the corresponding Node
enum kind, as well
as a name
field, representing the basename in that directory.
For reproducibility reasons, the lists MUST be sorted by that name and the name MUST be unique across all three lists.
For a detailed comparison with the git model, and what (and why we do differently, see here ) ↩︎
We currently use the BLAKE3 digest of the protobuf serialization of the
proto::Directory
struct to calculate these digests. While pretty stable across most implementations, there’s no guarantee this will always stay as-is, so we might switch to another serialization with stronger guarantees on that front in the future. See #111 for details. ↩︎