[PATCH 4 of 5] manifestv2: add support for reading new manifest format

Martin von Zweigbergk martinvonz at google.com
Wed Apr 1 23:36:03 CDT 2015


On Wed, Apr 1, 2015 at 9:24 PM Gregory Szorc <gregory.szorc at gmail.com>
wrote:

> On Wed, Apr 1, 2015 at 8:47 PM, Martin von Zweigbergk <
> martinvonz at google.com> wrote:
>
>>
>>
>> On Wed, Apr 1, 2015 at 7:42 PM Gregory Szorc <gregory.szorc at gmail.com>
>> wrote:
>>
>>> On Wed, Apr 1, 2015 at 7:27 PM, Mike Hommey <mh at glandium.org> wrote:
>>>
>>>> On Wed, Apr 01, 2015 at 09:19:03PM -0500, Matt Mackall wrote:
>>>> > On Thu, 2015-04-02 at 09:01 +0900, Mike Hommey wrote:
>>>> > > On Wed, Apr 01, 2015 at 10:34:49AM -0700, Martin von Zweigbergk
>>>> wrote:
>>>> > > > # HG changeset patch
>>>> > > > # User Martin von Zweigbergk <martinvonz at google.com>
>>>> > > > # Date 1427520401 25200
>>>> > > > #      Fri Mar 27 22:26:41 2015 -0700
>>>> > > > # Node ID aca6ee57dddf4b39732833a2bb603dcd19148754
>>>> > > > # Parent  7530c75651b04d04e0871c84dfda487b4e9e96b4
>>>> > > > manifestv2: add support for reading new manifest format
>>>> > > >
>>>> > > > The new manifest format is designed to be smaller, in particular
>>>> to
>>>> > > > produce smaller deltas. It stores hashes in binary and puts the
>>>> hash
>>>> > > > on a new line (for smaller deltas). It also uses stem compression
>>>> to
>>>> > > > save space for long paths. The format has room for metadata, but
>>>> > > > that's there only for future-proofing. The parser thus accepts any
>>>> > > > metadata and throws it away. For more information, see
>>>> > > > http://mercurial.selenic.com/wiki/ManifestV2Plan.
>>>> > >
>>>> > > I have several questions related to that document:
>>>> > > - Since manifest creation is done when committing, what is the plan
>>>> wrt
>>>> > >   what should happen when a commit with manifestv2 is pushed
>>>> (server may
>>>> > >   not support them, or may not want them even if it does)
>>>> >
>>>> > Not fully decided. Possibly on-the-fly conversion.
>>>>
>>>> The sha1 for a manifestv2 would be the same as the corresponding
>>>> (flattened) manifestv1? O_o
>>>>
>>>
>>> I think the point you are trying to make is there would be at least 2
>>> SHA-1s for every changeset, depending on how manifests are computed. That
>>> seems extremely confusing.
>>>
>>
>> Yes, that would be confusing. What this series adds is a new hash. What
>> Matt refers to is a BC-mode where the manifest is stored in v2 format in
>> the revlog, but the nodeid is calculated as if the content were v1. Once
>> that's done, we can convert between the formats on the fly. In this
>> BC-mode, we would have to produce the full-text manifest in both formats on
>> commit, and the on-the-fly conversion would be somewhat costly too. The
>> benefit is of course that the manifest revlog would be smaller (20-40%).
>> (Note that if we do add a BC-mode, we'd probably have to be careful not to
>> allow any metadata in the (v2) manifests, since that would not be part of
>> the hash and could (?) open up for some attack.)
>>
>
> That's an interesting idea. But as you said, it locks the server into
> BC-compatible behavior pretty much indefinitely.
>
> It's almost like you want to upgrade the server then have the server
> advertise manifest capabilities to the client. Older clients either
> wouldn't be able to push or we could do some server-side rebase magic and
> push down the rewritten changeset SHA-1 to the client somehow. Maybe the
> client would maintain a SHA-1 map file? hg-git has explored something
> similar.
>
> This makes my head hurt.
>

IIUC, you're (not really) suggesting different hashes on the client? Sounds
too complex to me.


>
>
>>
>>
>>> Another question: how can an existing repo seamlessly switch to the new
>>> manifest format?
>>>
>>
>> As I wrote in the message for patch 3 in this series, I think that should
>> be safe, but for now, we're keeping it simple by filling in the requires at
>> repo creation time. My original assumption was that we would fill in the
>> requires on the commit with the flag on. There seems to be no precedent for
>> such behavior, but it seems to make sense to me (largefiles would be close
>> to that, but not quite).
>>
>>
>>> Presumably we'll want to "upgrade" the Firefox repo to both directory
>>> manifests and manifestv2 for performance benefits.
>>>
>>
>> I had assumed that Firefox would be using this in ~5 years. I'm curious
>> what a more accurate number is. How soon can you require your developers to
>> have upgraded to a certain version of hg?
>>
>>
>>> But since manifestv2 is a requires-time thing, that would mean rewriting
>>> the entire manifest to v2. And that would change manifest SHA-1's which
>>> would invalidate every existing changeset SHA-1. That's a non-starter for
>>> us unless we can seamlessly handle requests for old changesets (hgweb URLs,
>>> clients updating to old changesets for bisection, etc).
>>>
>>
>> So either the BC-mode with on-the-fly conversion or we could allow
>> switching to the new formats on an existing repo. For tree manifests, I
>> don't think there will be a BC-mode, so if that turns out to be useful to
>> you, you'd probably have to require clients to upgrade at that point anyway.
>>
>> I'm a little surprised, but happy, that you mention only existing hashes.
>> You seem to be considering upgrading earlier than I had expected.
>>
>
> We can force people use a modern Mercurial if there is a compelling reason
> to do so. It's annoying to force people to upgrade, sure. But if it's all
> performance and feature wins, I don't think many will complain. We've done
> this before in January 2013 by requiring Python 2.7 to build Firefox. We
> can do it again if there are good reasons.
>
> What concerns us more than forcing a software upgrade onto people is
> dealing with a repo rewrite. References to existing SHA-1 need to work
> forever. Tons of automation would need to transition. The cost for a
> one-time transition would be significant. A global flag day would be far
> more expensive than a gradual transition. Maybe we would start playing new
> commits to both repo versions and have downstream consumers start pulling
> from the new repo before a push-only flag day? I don't know. We should talk
> about this in Montreal.
>

I would also hate for us to force a history rewrite. I've been planning for
mid-history format switch all along, and the reason for setting the
requires value at repo creation is just to keep it simple for now. I think
we should be able to relax that later. If we were not planning for that, we
wouldn't have bothered to find a way of telling old manifests from new (the
initial empty path).

So I really don't think you should have to talk about this in Montreal :-)
I won't be there, btw. I'm sure Matt will chime in if he disagrees with me.


> But it's not just Mozilla. Every Mercurial user is in the same boat.
> Although, only large repos would likely have anything significant to gain.
> I think we need to consider the implications beyond what Google, Facebook,
> and Mozilla are willing to tolerate. Mozilla is probably a decent proxy for
> the average company or organization that doesn't have the strong machine
> management that Facebook or Google are able to provide.
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://selenic.com/pipermail/mercurial-devel/attachments/20150402/4f45dc57/attachment.html>


More information about the Mercurial-devel mailing list