[PATCH 4 of 5] manifestv2: add support for reading new manifest format

Martin von Zweigbergk martinvonz at google.com
Mon Apr 13 11:46:38 CDT 2015


On Mon, Apr 13, 2015 at 8:45 AM Pierre-Yves David <
pierre-yves.david at ens-lyon.org> wrote:

>
>
> On 04/01/2015 09:24 PM, Gregory Szorc wrote:
> > On Wed, Apr 1, 2015 at 8:47 PM, Martin von Zweigbergk
> > <martinvonz at google.com <mailto:martinvonz at google.com>> wrote:
> >
> >
> >
> >     On Wed, Apr 1, 2015 at 7:42 PM Gregory Szorc
> >     <gregory.szorc at gmail.com <mailto:gregory.szorc at gmail.com>> wrote:
> >
> >         On Wed, Apr 1, 2015 at 7:27 PM, Mike Hommey <mh at glandium.org
> >         <mailto:mh at glandium.org>> wrote:
> >
> >             On Wed, Apr 01, 2015 at 09:19:03PM -0500, Matt Mackall wrote:
> >             > On Thu, 2015-04-02 at 09:01 +0900, Mike Hommey wrote:
> >             > > On Wed, Apr 01, 2015 at 10:34:49AM -0700, Martin von
> Zweigbergk wrote:
> >             > > > # HG changeset patch
> >             > > > # User Martin von Zweigbergk <martinvonz at google.com
> <mailto:martinvonz at google.com>>
> >             > > > # Date 1427520401 25200
> >             > > > #      Fri Mar 27 22:26:41 2015 -0700
> >             > > > # Node ID aca6ee57dddf4b39732833a2bb603dcd19148754
> >             > > > # Parent  7530c75651b04d04e0871c84dfda487b4e9e96b4
> >             > > > manifestv2: add support for reading new manifest format
> >             > > >
> >             > > > The new manifest format is designed to be smaller, in
> particular to
> >             > > > produce smaller deltas. It stores hashes in binary and
> puts the hash
> >             > > > on a new line (for smaller deltas). It also uses stem
> compression to
> >             > > > save space for long paths. The format has room for
> metadata, but
> >             > > > that's there only for future-proofing. The parser thus
> accepts any
> >             > > > metadata and throws it away. For more information, see
> >             > > >http://mercurial.selenic.com/wiki/ManifestV2Plan.
> >             > >
> >             > > I have several questions related to that document:
> >             > > - Since manifest creation is done when committing, what
> is the plan wrt
> >             > >   what should happen when a commit with manifestv2 is
> pushed (server may
> >             > >   not support them, or may not want them even if it does)
> >             >
> >             > Not fully decided. Possibly on-the-fly conversion.
> >
> >             The sha1 for a manifestv2 would be the same as the
> corresponding
> >             (flattened) manifestv1? O_o
> >
> >
> >         I think the point you are trying to make is there would be at
> >         least 2 SHA-1s for every changeset, depending on how manifests
> >         are computed. That seems extremely confusing.
> >
> >
> >     Yes, that would be confusing. What this series adds is a new hash.
> >     What Matt refers to is a BC-mode where the manifest is stored in v2
> >     format in the revlog, but the nodeid is calculated as if the content
> >     were v1. Once that's done, we can convert between the formats on the
> >     fly. In this BC-mode, we would have to produce the full-text
> >     manifest in both formats on commit, and the on-the-fly conversion
> >     would be somewhat costly too. The benefit is of course that the
> >     manifest revlog would be smaller (20-40%). (Note that if we do add a
> >     BC-mode, we'd probably have to be careful not to allow any metadata
> >     in the (v2) manifests, since that would not be part of the hash and
> >     could (?) open up for some attack.)
> >
> >
> > That's an interesting idea. But as you said, it locks the server into
> > BC-compatible behavior pretty much indefinitely.
> >
> > It's almost like you want to upgrade the server then have the server
> > advertise manifest capabilities to the client. Older clients either
> > wouldn't be able to push or we could do some server-side rebase magic
> > and push down the rewritten changeset SHA-1 to the client somehow. Maybe
> > the client would maintain a SHA-1 map file? hg-git has explored
> > something similar.
> >
> > This makes my head hurt.
> >
> >         Another question: how can an existing repo seamlessly switch to
> >         the new manifest format?
> >
> >
> >     As I wrote in the message for patch 3 in this series, I think that
> >     should be safe, but for now, we're keeping it simple by filling in
> >     the requires at repo creation time. My original assumption was that
> >     we would fill in the requires on the commit with the flag on. There
> >     seems to be no precedent for such behavior, but it seems to make
> >     sense to me (largefiles would be close to that, but not quite).
> >
> >         Presumably we'll want to "upgrade" the Firefox repo to both
> >         directory manifests and manifestv2 for performance benefits.
> >
> >
> >     I had assumed that Firefox would be using this in ~5 years. I'm
> >     curious what a more accurate number is. How soon can you require
> >     your developers to have upgraded to a certain version of hg?
> >
> >         But since manifestv2 is a requires-time thing, that would mean
> >         rewriting the entire manifest to v2. And that would change
> >         manifest SHA-1's which would invalidate every existing changeset
> >         SHA-1. That's a non-starter for us unless we can seamlessly
> >         handle requests for old changesets (hgweb URLs, clients updating
> >         to old changesets for bisection, etc).
> >
> >
> >     So either the BC-mode with on-the-fly conversion or we could allow
> >     switching to the new formats on an existing repo. For tree
> >     manifests, I don't think there will be a BC-mode, so if that turns
> >     out to be useful to you, you'd probably have to require clients to
> >     upgrade at that point anyway.
> >
> >     I'm a little surprised, but happy, that you mention only existing
> >     hashes. You seem to be considering upgrading earlier than I had
> >     expected.
> >
> >
> > We can force people use a modern Mercurial if there is a compelling
> > reason to do so. It's annoying to force people to upgrade, sure. But if
> > it's all performance and feature wins, I don't think many will complain.
> > We've done this before in January 2013 by requiring Python 2.7 to build
> > Firefox. We can do it again if there are good reasons.
> >
> > What concerns us more than forcing a software upgrade onto people is
> > dealing with a repo rewrite. References to existing SHA-1 need to work
> > forever. Tons of automation would need to transition. The cost for a
> > one-time transition would be significant. A global flag day would be far
> > more expensive than a gradual transition. Maybe we would start playing
> > new commits to both repo versions and have downstream consumers start
> > pulling from the new repo before a push-only flag day? I don't know. We
> > should talk about this in Montreal.
>
> Transition to a new hash was discussed at the New York Sprint (Nov
> 2013). Here is what I can remember of it.
>
> - Who ever break the hash, should also move the hashing algorithm to
> sha356.
>

40 bytes from sha256 should be enough, right? We're concerned about a
broken hash algorithm, IIUC, not so much about hash collisions.


>
> - We should probably move the format to allow "multiple hash" per
> changeset. This would allow to things:
>
>    - Keeping the old hash for reference,
>    - Moving to other hash algorithm (hash side) or algorithm (Mercurial
> side)
>
>    New server/client would start computing the new hash alongside the
> old hash. At some point a project can decide to stop accepting changeset
> that only had the old hash, and eventually drop it entirely.
>

I don't understand how this would work. Are the parent pointers always (for
both old-style and new-style commits) pointing to the old-style hash until
old-style hashes are no longer allowed? Would file and manifest revlogs be
using the old-style hash until the migration is complete or would you have
duplicate index files and shared data files?



>
> However, this feature MUST NOT turned into a
> "git-alias/omg-security-flaw-everywhere" feature.
>
> > But it's not just Mozilla. Every Mercurial user is in the same boat.
> > Although, only large repos would likely have anything significant to
> > gain. I think we need to consider the implications beyond what Google,
> > Facebook, and Mozilla are willing to tolerate. Mozilla is probably a
> > decent proxy for the average company or organization that doesn't have
> > the strong machine management that Facebook or Google are able to
> provide.
>
> I agree with Greg, on the fact that we need a way to smoothly transition
> to project that started small but eventually becomes big.
>

I also agree with Greg. I think we should allow transitioning from one type
of manifest (and changelog) to another mid-history.


>
> Because of that, I think we -should- have the same hashing algorithm for
> flat repository and sharded one.
>

I think that's a nice idea, but I don't want to delay our (Google's)
project too much because of it. We will not care about changing from flat
to sharded. As long as we agree on how to calculate the hash, there should
be nothing preventing us from implementing support for that hashing of flat
manifests *after* we have it implemented for sharded manifests (and
possibly even in wide use). That also lets us see how bad it is to let
*all* users use tree manifests. If that turns out not too bad, then there
is little reason to invest in making it possible.


>
> A big change here is that the hash of a revision will not be directly
> computed for the binary content. But using special logic.
>

For flat manifests, yes, I agree.


>
> One of the issue, is that the directory "hash" will have parent
> information in they hash. So the flat version needs to duplicate this
> information to be able to produce similar hash.
>

Right, the flat manifests would probably gain an entry for every directory.


> --
> Pierre-Yves David
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://selenic.com/pipermail/mercurial-devel/attachments/20150413/a8117d6c/attachment.html>


More information about the Mercurial-devel mailing list