fastcdc

Module ronomon

Source
Expand description

This module implements a variation of the FastCDC algorithm using 31-integers and right shifts instead of left shifts.

The explanation below is copied from ronomon/deduplication since this module is little more than a translation of that implementation:

The following optimizations and variations on FastCDC are involved in the chunking algorithm:

  • 31 bit integers to avoid 64 bit integers for the sake of the Javascript reference implementation.
  • A right shift instead of a left shift to remove the need for an additional modulus operator, which would otherwise have been necessary to prevent overflow.
  • Masks are no longer zero-padded since a right shift is used instead of a left shift.
  • A more adaptive threshold based on a combination of average and minimum chunk size (rather than just average chunk size) to decide the pivot point at which to switch masks. A larger minimum chunk size now switches from the strict mask to the eager mask earlier.
  • Masks use 1 bit of chunk size normalization instead of 2 bits of chunk size normalization.

Structs§

  • Represents a chunk, returned from the FastCDC iterator.
  • The FastCDC chunker implementation by Joran Dirk Greef.

Constants§

  • Largest acceptable value for the average chunk size.
  • Smallest acceptable value for the average chunk size.
  • Largest acceptable value for the maximum chunk size.
  • Smallest acceptable value for the maximum chunk size.
  • Largest acceptable value for the minimum chunk size.
  • Smallest acceptable value for the minimum chunk size.