[PATCH 1 of 2 V2] pathencode.c: for long paths, strip first dir, whether or not it's "data/"

Martin von Zweigbergk martinvonz at google.com
Fri May 8 11:10:42 CDT 2015


On Fri, May 8, 2015 at 7:37 AM Augie Fackler <raf at durin42.com> wrote:

> On Thu, May 07, 2015 at 11:02:20PM -0700, Kyle Lippincott wrote:
> > On Thursday, May 7, 2015, Martin von Zweigbergk <martinvonz at google.com>
> > wrote:
> >
> > >
> > > On Thu, May 7, 2015 at 4:34 PM Martin von Zweigbergk <
> > > martinvonz at google.com
> > > <javascript:_e(%7B%7D,'cvml','martinvonz at google.com');>> wrote:
> > >
> > >> On Thu, May 7, 2015 at 4:27 PM Adrian Buehlmann <adrian at cadifra.com
> > >> <javascript:_e(%7B%7D,'cvml','adrian at cadifra.com');>> wrote:
> > >>
> > >>> On 2015-05-08 00:56, Martin von Zweigbergk wrote:
> > >>> > Heh, Drew actually suggested "meta" just so we could avoid dealing
> with
> > >>> > this inconsistency between native and pure code.
> > >>>
> > >>> Yes, please.
> > >>>
> > >>
> > >> I don't mind changing to "meta", but I'm still concerned about the
> > >> discrepancy between native and pure code. Do you have a test case that
> > >> shows that performance really is a problem?
> > >>
> > >
> > > My crude timings suggest that the ~2k calls to hashencode() on the
> Firefox
> > > repo (~2k files out of ~200k have long names) take ~6ms, while the ~2k
> > > calls to strchr() take ~12us. So I'd say the performance argument does
> not
> > > exist and we just have to pick a name that we like.
> > >
> > > I don't particularly care what name we pick -- "metadata" and "meta"
> are
> > > both fine with me.  Another obvious choice is "manifests", but I know
> Kyle
> > > (CC'd) prefers "metadata" in case we ever want to store something else
> per
> > > directory that's not a manifest. Even though we don't even have any
> plans
> > > for anything else, I see little argument for "manifests" over
> "meta(data)".
> > >
> >
> > I'm fine with meta.  As mentioned it has the benefit of being equal in
> > length, and so yeah you can copy the logic over to the pure code to just
> > strip 5.   I agree that the discrepancy is the more annoying issue, so
> I'm
> > fine with whatever solution gets applied, just make them consistent.
>
> +1 - as long as at the end of this exercise the pure and c code do the
> same thing, I don't care much what that thing is.
>

It seems like most of us are the same page regarding getting rid of the
discrepancy, but it's still not clear what I'm supposed to do. I think we
should leave the decision on the name until we have agreed on the below.
Here are some alternatives:

Stripping 5 characters (what the C code currently does):
* metadata/ files get stored in dh/ata/
* If "ata/" sounds funny, just use "meta" instead of "metadata" and we
won't notice it (but it's still a silly limitation)
* Fix is smaller (a net change of 0 bytes)
* Makes pure code slower (but no one sane runs pure code)

Stripping top directory (what the Python code currently does):
* metadata/ files get stored in dh/
* Stripping a full directory just seems like a more natural thing to do
* Makes hashencode() ~0.2% slower (but if we are really worried about
performance, we should store all non-data/ files using just the hash as
filename and not bother with_encodedir,_lowerencode, auxencode)

Stripping top directory if it doesn't start with 'd':
* metadata/ files get stored in dh/metadata/
* Simple fix needed in both C and Python
* Probably slows down hashencode() by ~0.01% (my guess, based on 50 clock
cycles -> 3) for data/ paths. Probably slows down non-data paths more,
since_encodedir,_lowerencode, auxencode have to run also on the "metadata/"
part.

Matt, could you make a decision on this?


> >
> >
> > > If you do, I could move the patch to the pure side (to strip exactly 5
> > >> characters). If not, I'd still prefer to have this patch applied, but
> it's
> > >> up to Matt in the end, of course.
> > >>
> > >
> > > Matt, as explained above, I personally think this patch is still good.
> > >
> > >
> > >>
> > >>
> > >>
> > >>>
> > >>> (I already like Drew)
> > >>>
> > >>
>
> > _______________________________________________
> > Mercurial-devel mailing list
> > Mercurial-devel at selenic.com
> > http://selenic.com/mailman/listinfo/mercurial-devel
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://selenic.com/pipermail/mercurial-devel/attachments/20150508/292f551c/attachment.html>


More information about the Mercurial-devel mailing list