[PATCH 1 of 8] largefiles: replace "_isXXXXing" attributes with "_lfautocommit"

Mon Sep 29 17:51:08 CDT 2014

On Thu, 2014-09-25 at 00:43 +0200, Mads Kiilerich wrote:
> On 09/24/2014 03:11 PM, FUJIWARA Katsunori wrote:
> > On Mon, 22 Sep 2014 11:14:56 +0200, Mads Kiilerich <mads at kiilerich.com> wrote:
> >> Hi
> >>
> >> I like that this patch series improve largefiles to handle more corner 
> >> cases correctly. Nice work!
> >>
> >> I do not so much like increasing amount of hacks we have to use. It is 
> >> not like I feel we finally got it right and now can start cleaning up 
> >> and simplifying the code. Instead, in order to take the next steps, this 
> >> series has to introduce more complexity and more callbacks in core 
> >> Mercurial.
> >>
> >> I wonder if we should try a different approach. As you point out, it is 
> >> almost impossible to do the right thing in the internal commit functions 
> >> - they don't have enough context. One alternative could be to hook in at 
> >> the very low level, updating the standin files whenever they are read 
> >> and propagate their change whenever they are written. Another approach 
> >> would be to do all synchronization before/after calling in to core 
> >> Mercurial and let the core pretty much treat standin files like any 
> >> other file. I have played around with that approach and posted a proof 
> >> of concept.
> >>
> >> /Mads
> > Thank you for your comments, and sorry for late response.
> >
> > Your approach posted as PoC looks simple and good !
> >
> > But on the other hand, I'm also worry about below for it.
> >
> > - performance impact on large scale repositories
> >
> >   Examination of updating for ALL standins after original command
> >   logic may have serious performance impact on large scale
> >   repositories, because it may imply file I/O on many standins.
> 
> In the PoC I reuse the existing simple method of reading all standins.
> That could be optimized. We could rely on the dirstates to see which
> files it is relevant to process.
> 
> >   In many cases, updated standins can be known exactly, and deep
> >   hooking can avoid such performance impact (even though revert,
> >   update and merge cases are not yet optimized :-P)
> >
> >     - import       => scmutil.marktouched with filelist
> >     - revert       => cmdutil._performrevert with actions
> >     - update/merge => merge.applyupdates with actions
> 
> Yes, it is impressive that these methods work. But they are also complex
> and fragile and hard to maintain.
> 
> > - visibility of largefiles for hooks
> >
> >   Hooks below can't see appropriate largefiles in the working
> >   directory, because only standins are updated while command
> >   execution.
> 
> Yes, I think that is a acceptable if that is what it takes to give a
> simple model that works consistently everywhere.
> 
> >   Even though these hooks can't see appropriate largefiles now too :-),
> >   deep hooking can fix this problem.
> 
> If we really want deep hooking then I would consider experimenting with
> hooking in to the vfs layer, updating standin files every time they are
> read or walked and are out of date, and propagating their value every
> time they are written.
> 
> > What do you think about these points ?
> 
> I think they are valid. There is no perfect solution ... and probably
> not even a good one. But I think we should reconsider whether the
> current approach of violating all layers and add hacks everywhere is the
> best solution.

I think if we know how to get largefiles more correct (for instance, by
applying this patch series), then we should do that. While this might
increase the complexity, it also puts us in a better place for measuring
regressions when refactoring.. and also delivers immediate benefits to
users.

-- 
Mathematics is the supreme nostalgia of our time.