[PATCH 4 of 5] manifestv2: add support for reading new manifest format

Pierre-Yves David pierre-yves.david at ens-lyon.org
Mon Apr 13 10:44:57 CDT 2015



On 04/01/2015 09:24 PM, Gregory Szorc wrote:
> On Wed, Apr 1, 2015 at 8:47 PM, Martin von Zweigbergk
> <martinvonz at google.com <mailto:martinvonz at google.com>> wrote:
>
>
>
>     On Wed, Apr 1, 2015 at 7:42 PM Gregory Szorc
>     <gregory.szorc at gmail.com <mailto:gregory.szorc at gmail.com>> wrote:
>
>         On Wed, Apr 1, 2015 at 7:27 PM, Mike Hommey <mh at glandium.org
>         <mailto:mh at glandium.org>> wrote:
>
>             On Wed, Apr 01, 2015 at 09:19:03PM -0500, Matt Mackall wrote:
>             > On Thu, 2015-04-02 at 09:01 +0900, Mike Hommey wrote:
>             > > On Wed, Apr 01, 2015 at 10:34:49AM -0700, Martin von Zweigbergk wrote:
>             > > > # HG changeset patch
>             > > > # User Martin von Zweigbergk <martinvonz at google.com <mailto:martinvonz at google.com>>
>             > > > # Date 1427520401 25200
>             > > > #      Fri Mar 27 22:26:41 2015 -0700
>             > > > # Node ID aca6ee57dddf4b39732833a2bb603dcd19148754
>             > > > # Parent  7530c75651b04d04e0871c84dfda487b4e9e96b4
>             > > > manifestv2: add support for reading new manifest format
>             > > >
>             > > > The new manifest format is designed to be smaller, in particular to
>             > > > produce smaller deltas. It stores hashes in binary and puts the hash
>             > > > on a new line (for smaller deltas). It also uses stem compression to
>             > > > save space for long paths. The format has room for metadata, but
>             > > > that's there only for future-proofing. The parser thus accepts any
>             > > > metadata and throws it away. For more information, see
>             > > >http://mercurial.selenic.com/wiki/ManifestV2Plan.
>             > >
>             > > I have several questions related to that document:
>             > > - Since manifest creation is done when committing, what is the plan wrt
>             > >   what should happen when a commit with manifestv2 is pushed (server may
>             > >   not support them, or may not want them even if it does)
>             >
>             > Not fully decided. Possibly on-the-fly conversion.
>
>             The sha1 for a manifestv2 would be the same as the corresponding
>             (flattened) manifestv1? O_o
>
>
>         I think the point you are trying to make is there would be at
>         least 2 SHA-1s for every changeset, depending on how manifests
>         are computed. That seems extremely confusing.
>
>
>     Yes, that would be confusing. What this series adds is a new hash.
>     What Matt refers to is a BC-mode where the manifest is stored in v2
>     format in the revlog, but the nodeid is calculated as if the content
>     were v1. Once that's done, we can convert between the formats on the
>     fly. In this BC-mode, we would have to produce the full-text
>     manifest in both formats on commit, and the on-the-fly conversion
>     would be somewhat costly too. The benefit is of course that the
>     manifest revlog would be smaller (20-40%). (Note that if we do add a
>     BC-mode, we'd probably have to be careful not to allow any metadata
>     in the (v2) manifests, since that would not be part of the hash and
>     could (?) open up for some attack.)
>
>
> That's an interesting idea. But as you said, it locks the server into
> BC-compatible behavior pretty much indefinitely.
>
> It's almost like you want to upgrade the server then have the server
> advertise manifest capabilities to the client. Older clients either
> wouldn't be able to push or we could do some server-side rebase magic
> and push down the rewritten changeset SHA-1 to the client somehow. Maybe
> the client would maintain a SHA-1 map file? hg-git has explored
> something similar.
>
> This makes my head hurt.
>
>         Another question: how can an existing repo seamlessly switch to
>         the new manifest format?
>
>
>     As I wrote in the message for patch 3 in this series, I think that
>     should be safe, but for now, we're keeping it simple by filling in
>     the requires at repo creation time. My original assumption was that
>     we would fill in the requires on the commit with the flag on. There
>     seems to be no precedent for such behavior, but it seems to make
>     sense to me (largefiles would be close to that, but not quite).
>
>         Presumably we'll want to "upgrade" the Firefox repo to both
>         directory manifests and manifestv2 for performance benefits.
>
>
>     I had assumed that Firefox would be using this in ~5 years. I'm
>     curious what a more accurate number is. How soon can you require
>     your developers to have upgraded to a certain version of hg?
>
>         But since manifestv2 is a requires-time thing, that would mean
>         rewriting the entire manifest to v2. And that would change
>         manifest SHA-1's which would invalidate every existing changeset
>         SHA-1. That's a non-starter for us unless we can seamlessly
>         handle requests for old changesets (hgweb URLs, clients updating
>         to old changesets for bisection, etc).
>
>
>     So either the BC-mode with on-the-fly conversion or we could allow
>     switching to the new formats on an existing repo. For tree
>     manifests, I don't think there will be a BC-mode, so if that turns
>     out to be useful to you, you'd probably have to require clients to
>     upgrade at that point anyway.
>
>     I'm a little surprised, but happy, that you mention only existing
>     hashes. You seem to be considering upgrading earlier than I had
>     expected.
>
>
> We can force people use a modern Mercurial if there is a compelling
> reason to do so. It's annoying to force people to upgrade, sure. But if
> it's all performance and feature wins, I don't think many will complain.
> We've done this before in January 2013 by requiring Python 2.7 to build
> Firefox. We can do it again if there are good reasons.
>
> What concerns us more than forcing a software upgrade onto people is
> dealing with a repo rewrite. References to existing SHA-1 need to work
> forever. Tons of automation would need to transition. The cost for a
> one-time transition would be significant. A global flag day would be far
> more expensive than a gradual transition. Maybe we would start playing
> new commits to both repo versions and have downstream consumers start
> pulling from the new repo before a push-only flag day? I don't know. We
> should talk about this in Montreal.

Transition to a new hash was discussed at the New York Sprint (Nov 
2013). Here is what I can remember of it.

- Who ever break the hash, should also move the hashing algorithm to sha356.

- We should probably move the format to allow "multiple hash" per 
changeset. This would allow to things:

   - Keeping the old hash for reference,
   - Moving to other hash algorithm (hash side) or algorithm (Mercurial 
side)

   New server/client would start computing the new hash alongside the 
old hash. At some point a project can decide to stop accepting changeset 
that only had the old hash, and eventually drop it entirely.

However, this feature MUST NOT turned into a 
"git-alias/omg-security-flaw-everywhere" feature.

> But it's not just Mozilla. Every Mercurial user is in the same boat.
> Although, only large repos would likely have anything significant to
> gain. I think we need to consider the implications beyond what Google,
> Facebook, and Mozilla are willing to tolerate. Mozilla is probably a
> decent proxy for the average company or organization that doesn't have
> the strong machine management that Facebook or Google are able to provide.

I agree with Greg, on the fact that we need a way to smoothly transition 
to project that started small but eventually becomes big.

Because of that, I think we -should- have the same hashing algorithm for 
flat repository and sharded one.

A big change here is that the hash of a revision will not be directly 
computed for the binary content. But using special logic.

One of the issue, is that the directory "hash" will have parent 
information in they hash. So the flat version needs to duplicate this 
information to be able to produce similar hash.

-- 
Pierre-Yves David


More information about the Mercurial-devel mailing list