[PATCH 3 of 4] revlog: raise an exception earlier if an entry is too large
mpm at selenic.com
Sat May 23 15:55:31 CDT 2015
On Fri, 2015-05-22 at 13:40 -0400, Jordi Gutiérrez Hermoso wrote:
> On Fri, 2015-05-22 at 12:05 -0500, Matt Mackall wrote:
> > On Thu, 2015-05-21 at 16:30 -0400, Jordi Gutiérrez Hermoso wrote:
> > > # HG changeset patch
> > > # User Jordi Gutiérrez Hermoso <jordigh at octave.org>
> > > # Date 1432239712 14400
> > > # Thu May 21 16:21:52 2015 -0400
> > > # Node ID 94b79351d9569b65c3c111cbfe88a03112d617a9
> > > # Parent 88b99c48761cc7b982b84294aa679b63f5edf967
> > > revlog: raise an exception earlier if an entry is too large
> > Issue number?
> I considered the 4th patch in this series to address the issue, but I
> suppose this one does too. Is it ok to have two patches with the same
> issue number?
> > > Before we were relying on _pack to error out when trying to pass an
> > > integer that was too large for the "i" format specifier. Now we check
> > > this earlier so we can form a better error message.
> > > diff --git a/mercurial/revlog.py b/mercurial/revlog.py
> > > --- a/mercurial/revlog.py
> > > +++ b/mercurial/revlog.py
> > > @@ -152,6 +152,10 @@ indexformatng = ">Qiiiiii20s12x"
> > > ngshaoffset = 32
> > > versionformat = ">I"
> > >
> > > +# matches uncompressed length of indexformatng (2 gigs, 4-byte signed
> > > +# integer)
> > > +maxentrysize = 2147483648
> > > +
> > You're off by one. The limit is 0x7fffffff. Let's use hex, add a
> > comment, and prefix with _.
> > > class revlogio(object):
> > > def __init__(self):
> > > self.size = struct.calcsize(indexformatng)
> > > @@ -162,6 +166,11 @@ class revlogio(object):
> > > return index, getattr(index, 'nodemap', None), cache
> > >
> > > def packentry(self, entry, node, version, rev):
> > > + # uncompressed length
> > > + if entry > maxentrysize:
> > > + raise RevlogError(_("too large for revlog storage"),
> > Should mention 2G. And probably the filename.
> The filename isn't available at this point. And there may not even be
> a file at all: you could also hit this limit if you try to create a
> giant manifest entry, I think. That's why I decided to handle the
> filename higher up in the call stack.
Yep. Let's use self.indexfile for now. It's not perfect, but it's better
than nothing for the generic revlog message. I hadn't seen 4of4.
> > > + hint=_("consider using the largefiles extension"))
> > Eek. I'm not sure about that. Can we get just the fix and then bikeshed
> > the hint part?
> Sure, but the hint is not shown for a RevlogError. You have to throw
> an Abort for the hint to show, so that's why I also I thought that
> this could just be added here and a caller could decide to show the
> hint or not.
Right, my request is to send a simpler version of the fix without the
hint. That way the kernel of the fix isn't needlessly delayed on tuning
less important details.
So ideally a series like this would be:
- report "too big" error
- refactor of hints
- addition of hint about feature-of-last-resort
..which maximizes the odds of making forward progress on the core fix.
Mathematics is the supreme nostalgia of our time.
More information about the Mercurial-devel