[PATCH 5 of 5 RFC] revlog: increase I/O bound to 4x the amount of data consumed

Antoine Pitrou solipsis at pitrou.net
Thu Nov 13 03:07:31 CST 2014


How should one test the effects (positive or negative) of this change
on a repository?



On Wed, 12 Nov 2014 15:09:12 -0800
Siddharth Agarwal <sid0 at fb.com> wrote:
> # HG changeset patch
> # User Siddharth Agarwal <sid0 at fb.com>
> # Date 1415765299 28800
> #      Tue Nov 11 20:08:19 2014 -0800
> # Node ID 14df598f1c13fa7fb04daf652efb94bd0f55e8c9
> # Parent  ec432ebb569fd4537093a3c446ccaefce5298636
> revlog: increase I/O bound to 4x the amount of data consumed
> 
> This doesn't affect normal clones since they'd be bound by the CPU bound below
> anyway -- it does, however, improve generaldelta clones significantly.
> 
> This also results in better deltaing for generaldelta clones -- in generaldelta
> clones, we calculate deltas with respect to the closest base if it has a higher
> revision number than either parent. If the base is on a significantly different
> branch, this can result in pointlessly massive deltas. This reduces the number
> of bases and hence the number of bad deltas.
> 
> Empirically, for a highly branchy repository, this resulted in an improvement
> of around 15% to manifest size.
> 
> The sole test change is because in that case the compressed text is smaller
> than the compressed delta. (We already use fulltexts if the uncompressed text
> is smaller than the compressed delta.)
> 
> diff --git a/mercurial/revlog.py b/mercurial/revlog.py
> --- a/mercurial/revlog.py
> +++ b/mercurial/revlog.py
> @@ -1267,7 +1267,7 @@
>          #   the amount of I/O we need to do.
>          # - 'compresseddeltalen' is the sum of the total size of deltas we need
>          #   to apply -- bounding it limits the amount of CPU we consume.
> -        if (d is None or dist > textlen * 2 or l > textlen or
> +        if (d is None or dist > textlen * 4 or l > textlen or
>              compresseddeltalen > textlen * 2 or
>              (self._maxchainlen and chainlen > self._maxchainlen)):
>              text = buildtext()





More information about the Mercurial-devel mailing list