Expand description
This module implements the canonical FastCDC algorithm as described in the paper by Wen Xia, et al., in 2016.
The algorithm incorporates a simplified hash judgement using the fast Gear hash, sub-minimum chunk cut-point skipping, and normalized chunking to produce chunks of a more consistent length.
There are two ways in which to use the FastCDC
struct defined in this
module. One is to simply invoke cut()
while managing your own start
and
remaining
values. The other is to use the struct as an Iterator
that
yields Chunk
structs which represent the offset and size of the chunks.
Note that attempting to use both cut()
and Iterator
on the same
FastCDC
instance will yield incorrect results.
Note that the cut()
function returns the 64-bit hash of the chunk, which
may be useful in scenarios involving chunk size prediction using historical
data, such as in RapidCDC or SuperCDC. This hash value is also given in the
hash
field of the Chunk
struct. While this value has rather low entropy,
it is computationally cost-free and can be put to some use with additional
record keeping.
The StreamCDC
implementation is similar to FastCDC
except that it will
read data from a Read
into an internal buffer of max_size
and produce
ChunkData
values from the Iterator
.
Structs§
- Represents a chunk returned from the FastCDC iterator.
- Represents a chunk returned from the StreamCDC iterator.
- The FastCDC chunker implementation from 2016.
- The FastCDC chunker implementation from 2016 with streaming support.
Enums§
- The error type returned from the
StreamCDC
iterator. - The level for the normalized chunking used by FastCDC and StreamCDC.
Constants§
- Largest acceptable value for the average chunk size.
- Smallest acceptable value for the average chunk size.
- Largest acceptable value for the maximum chunk size.
- Smallest acceptable value for the maximum chunk size.
- Largest acceptable value for the minimum chunk size.
- Smallest acceptable value for the minimum chunk size.