[PATCH 01 of 10] largefiles: centralize the logic to get outgoing largefiles
Mads Kiilerich
mads at kiilerich.com
Fri Mar 7 13:53:26 CST 2014
On 03/07/2014 03:47 PM, FUJIWARA Katsunori wrote:
> At Thu, 06 Mar 2014 15:48:20 +0100, Mads Kiilerich wrote:
>>> shortname = '.hglf'
>>> shortnameslash = shortname + '/'
>>> @@ -365,3 +366,31 @@
>>> if f[0] not in filelist:
>>> filelist.append(f[0])
>>> return filelist
>>> +
>>> +def getlfilestoupload(repo, missing, mapfunc=None):
>>> + toupload = set()
>>> + if not missing:
>>> + return toupload
>> Do this check make any difference? Looping an empty list is free.
> "missing" will be bound to "None" if there are no outgoing revisions,
> and it causes exception raising at looping.
Really? It seems like this function in both places replaces a 'for n in
o' loop?
(Btw: it would be nice with a docstring that says what 'missing' is and
that the function returns a set of mapfunc results.)
>>> + if not mapfunc:
>>> + mapfunc = lambda f, ctx: f
>> I not think this mapfunc thing is pretty.
>>
>> In the end, it would be nice to tell in the UI which largefile we are
>> pulling/pushing. Not only the hash, but also the largefile name (or one
>> of them).
>>
>> Currently outgoing and summary will list all largefiles in outgoing
>> changesets, no matter if they really need pushing or not. Arguably it
>> should only list the largefiles that really are outgoing. For that we
>> need the hashes.
> Do you mean that "getlfilestoupload()" should return also hashes in
> the way for the improvement (to list or count only really outgoing
> largefiles) of outgoing/summary in the future ?
Yes, I think so. Outgoing/summary do not show all revisions - it only
shows the revisions that are outgoing. Currently outgoing/summary shows
all largefiles in these revisions, no matter if they are outgoing or not
(if I understand it correctly) - that seems like a bug. Instead it
should only show the largefiles that actually will be pushed. There
could be different ways of indicating that. Showing the involved
largefile names could be one. Showing the actual number of largefiles
(unique hashes) could be another, and yet another could be to show the
total amount of bytes.
What do you think about this?
>> So how about letting this function return a dict mapping from hash to a
>> filename that uses it? For now the two users of this function can just
>> look at either keys or values.
> I chose this (= passing map function, if needed) implementation to
> avoid performance impact below when there are many outgoing
> largefiles:
>
> - returning hashes always may slow down outgoing/summary by
> meaningless "ctx[f].data().strip()" for them
A largefile is still a big thing and expensive to shuffle around due to
its size (and the general design of what it is). My gut feeling is that
an extra call to .data() will be a small extra cost in the greater
picture ... especially considering that it (and a lstat call to the
remote location) is essential for actually giving a useful result for
outgoing/summary.
> - trying "filename/context => hash" on the result of
> "getlfilestoupload()" causes scanning (maybe large) list again
Sorry, I don't understand that.
My thought would be that the function should return a "hash =>
one-of-the-filenames" mapping. I don't think there would be a need for
any scanning ... or how it would be a problem.
> If these are not so serious and/or you intend improvement of
> outgoing/summary in the future as I guess above, I'll change
> "getlfilestoupload()" as you mentioned.
I don't have any particular plans for that. There is lots of bigger
problems with largefiles - this nice series is addressing one of them.
Anyway: These were just my thoughts. If you still prefer the way you
proposed to do these changes then it is fine with me.
/Mads
More information about the Mercurial-devel
mailing list