Data Model

This describes the data model used in snix-castore to describe file system trees. blob / chunk storage is covered by other documents.

For those familiar, snix-castore uses a similar concept as git tree objects, which also is a merkle structure. 1

Nodeexternal link

snix-castore can represent three different types. Nodes themselves don’t have names, names are given by being in a Directory structure.

Node::File

A (regular) file. We store the BLAKE3external link digest of the raw file contents, the length of the raw data, and an executable bit.

A symbolic link. We store the symlink target contents.

Node::Directory

A (child) directory. We store the digest of the Directory structure describing its “contents”.

We also store a size field, containing the (total) number of all child elements in the referenced Directory, which helps for inode calculation.

Directoryexternal link

The Directory struct contains all nodes in a single directory (on that level), alongside with their (base)names (called PathComponent ).

. and .. are not included.

For the Directory struct, a Digest can be calculated2, which is what the parent Node::Directory will use as a reference, to build a merkle structure.

PathComponentexternal link

This is a more strict version of bytes, reduced to valid path components in a Directory .

It disallows slashes, null bytes, ., .. and the empty string. It also rejects too long names (> 255 bytes).

Merkle DAG

The pointers from Node::File to Directory, and this one potentially containing Node::File again makes the whole structure a merkle tree (or strictly speaking, a graph, as two elements pointing to a child directory with the same contents would point to the same Directory message).

Protobuf

In addition to the Rust types described above, there’s also a protobuf representation, which differs slightly:

Instead of nodes being unnamed, and Directory containing a map from PathComponent to Node (and keys being the basenames in that directory), the Directory message contains three lists, directories, files and symlinks, holding DirectoryEntry, FileEntry and SymlinkEntry messages respectively.

These contain all fields present in the corresponding Node enum kind, as well as a name field, representing the basename in that directory.

For reproducibility reasons, the lists MUST be sorted by that name and the name MUST be unique across all three lists.


  1. For a detailed comparison with the git model, and what (and why we do differently, see here↩︎

  2. We currently use the BLAKE3external link digest of the protobuf serialization of the proto::Directory struct to calculate these digests. While pretty stable across most implementations, there’s no guarantee this will always stay as-is, so we might switch to another serialization with stronger guarantees on that front in the future. See #111external link for details. ↩︎