fastcdc

Module v2016

Source
Expand description

This module implements the canonical FastCDC algorithm as described in the paper by Wen Xia, et al., in 2016.

The algorithm incorporates a simplified hash judgement using the fast Gear hash, sub-minimum chunk cut-point skipping, and normalized chunking to produce chunks of a more consistent length.

There are two ways in which to use the FastCDC struct defined in this module. One is to simply invoke cut() while managing your own start and remaining values. The other is to use the struct as an Iterator that yields Chunk structs which represent the offset and size of the chunks. Note that attempting to use both cut() and Iterator on the same FastCDC instance will yield incorrect results.

Note that the cut() function returns the 64-bit hash of the chunk, which may be useful in scenarios involving chunk size prediction using historical data, such as in RapidCDC or SuperCDC. This hash value is also given in the hash field of the Chunk struct. While this value has rather low entropy, it is computationally cost-free and can be put to some use with additional record keeping.

The StreamCDC implementation is similar to FastCDC except that it will read data from a Read into an internal buffer of max_size and produce ChunkData values from the Iterator.

Structs§

  • Represents a chunk returned from the FastCDC iterator.
  • Represents a chunk returned from the StreamCDC iterator.
  • The FastCDC chunker implementation from 2016.
  • The FastCDC chunker implementation from 2016 with streaming support.

Enums§

  • The error type returned from the StreamCDC iterator.
  • The level for the normalized chunking used by FastCDC and StreamCDC.

Constants§

  • Largest acceptable value for the average chunk size.
  • Smallest acceptable value for the average chunk size.
  • Largest acceptable value for the maximum chunk size.
  • Smallest acceptable value for the maximum chunk size.
  • Largest acceptable value for the minimum chunk size.
  • Smallest acceptable value for the minimum chunk size.