Data Model

This describes the data model used in snix-castore to describe file system trees. blob / chunk storage is covered by other documents.

For those familiar, snix-castore uses a similar concept as git tree objects, which also is a merkle structure. ¹

Node

snix-castore can represent three different types. Nodes themselves don’t have names, names are given by being in a Directory structure.

`Node::File`

A (regular) file. We store the BLAKE3 digest of the raw file contents, the length of the raw data, and an executable bit.

`Node::Symlink`

A symbolic link. We store the symlink target contents.

`Node::Directory`

A (child) directory. We store the digest of the Directory structure describing its “contents”.

We also store a size field, containing the (total) number of all child elements in the referenced Directory, which helps for inode calculation.

PathComponent

This is a more strict version of bytes, reduced to valid path components in a Directory .

It disallows slashes, null bytes, ., .. and the empty string. It also rejects too long names (> 255 bytes).

Merkle DAG

The pointers from Node::File to Directory, and this one potentially containing Node::File again makes the whole structure a merkle tree (or strictly speaking, a graph, as two elements pointing to a child directory with the same contents would point to the same Directory message).

Protobuf

In addition to the Rust types described above, there’s also a protobuf representation, which differs slightly:

Instead of nodes being unnamed, and Directory containing a map from PathComponent to Node (and keys being the basenames in that directory), the Directory message contains three lists, directories, files and symlinks, holding DirectoryEntry, FileEntry and SymlinkEntry messages respectively.

These contain all fields present in the corresponding Node enum kind, as well as a name field, representing the basename in that directory.

For reproducibility reasons, the lists MUST be sorted by that name and the name MUST be unique across all three lists.

For a detailed comparison with the git model, and what (and why we do differently, see here ) ↩︎
We currently use the BLAKE3 digest of the protobuf serialization of the proto::Directory struct to calculate these digests. While pretty stable across most implementations, there’s no guarantee this will always stay as-is, so we might switch to another serialization with stronger guarantees on that front in the future. See #111 for details. ↩︎

Component Overview

Protocol

Docs

Snix

Title here

Data Model

Node

`Node::File`

`Node::Symlink`

`Node::Directory`

Directory

PathComponent

Merkle DAG

Protobuf

Data Model

Node external link

Node::File#

Node::Symlink#

Node::Directory#

Directory external link

PathComponent external link

Merkle DAG#

Protobuf#

Node

`Node::File`

`Node::Symlink`

`Node::Directory`

Directory

PathComponent

Merkle DAG

Protobuf