[PATCH 1 of 2 RFC] extdiff: use archiver to take snapshots of committed revisions

Mon Feb 9 19:59:23 CST 2015

On Mon, 09 Feb 2015 02:18:16 -0500, Mathias De Maré  
<mathias.demare at gmail.com> wrote:

> On Mon, Feb 9, 2015 at 5:01 AM, Matt Harbison <mharbison72 at gmail.com>  
> wrote:
>
>> # HG changeset patch
>> # User Matt Harbison <matt_harbison at yahoo.com>
>> # Date 1342054131 14400
>> #      Wed Jul 11 20:48:51 2012 -0400
>> # Node ID 6abceaa1a49f82cebd3a4f141f69558e2bb3cec4
>> # Parent  ff5caa8dfd993680d9602ca6ebb14da9de10d5f4
>> extdiff: use archiver to take snapshots of committed revisions
>>
>> [This is the proof of concept that Mathias asked for.  The fix for file
>> archiving from the internal API maybe should be a separate commit.]
>>
>
> Thanks! :-)
>
> How do you want to proceed with this? Do you want to continue with these
> patches yourself, or is it something you don't have time to continue  
> with?

I've got 4 or 5 patches waiting for the current backlog to break free, but  
I've got time to work on it.  You've motivated me to look at this again  
:-).  The first thing I'd like to do is figure out what is going to be the  
least amount of headache for largefiles.

It's looking like maybe tweaking archive is the way to go (see below).  If  
that's the case, I'll need your help tweaking the git implementation.  But  
I'd like someone familiar with subrepo design to chime in before I wander  
too far off into the weeds.

>>
>> There should be no visible functional differences, other than the  
>> largefile
>> standins are no longer included in the non working copy snapshots.   
>> That's
>> probably not a big deal, and proper largefile support can still be  
>> added.
>> This
>> is the first step to make -S work.  The full (deep) working copy  
>> snapshot
>> needs
>> to be handled prior to that.  This could probably be improved in the
>> future by
>> excluding .hgsub and .hgsubstate from status, since that is really just
>> private
>> bookkeeping info.
>>
>> diff --git a/hgext/extdiff.py b/hgext/extdiff.py
>> --- a/hgext/extdiff.py
>> +++ b/hgext/extdiff.py
>> @@ -63,6 +63,7 @@
>>  from mercurial.i18n import _
>>  from mercurial.node import short, nullid
>>  from mercurial import cmdutil, scmutil, util, commands, encoding,
>> filemerge
>> +from mercurial import archival
>>  import os, shlex, shutil, tempfile, re
>>
>>  cmdtable = {}
>> @@ -80,31 +81,44 @@
>>          dirname = '%s.%s' % (dirname, short(node))
>>      base = os.path.join(tmproot, dirname)
>>      os.mkdir(base)
>> +    fns_and_mtime = []
>> +
>>      if node is not None:
>>          ui.note(_('making snapshot of %d files from rev %s\n') %
>>                  (len(files), short(node)))
>> +
>> +        # Use archive to build the snapshot for committed nodes.  (It
>> aborts if
>> +        # the list is empty.)
>> +        if files:
>> +            repo.ui.setconfig("ui", "archivemeta", False)
>> +
>> +            archival.archive(repo, base, node, 'files',
>> +                             matchfn=scmutil.matchfiles(repo, files))
>>      else:
>>          ui.note(_('making snapshot of %d files from working  
>> directory\n')
>> %
>>              (len(files)))
>> -    wopener = scmutil.opener(base)
>> -    fns_and_mtime = []
>> -    ctx = repo[node]
>> -    for fn in sorted(files):
>> -        wfn = util.pconvert(fn)
>> -        if wfn not in ctx:
>> -            # File doesn't exist; could be a bogus modify
>> -            continue
>> -        ui.note('  %s\n' % wfn)
>> -        dest = os.path.join(base, wfn)
>> -        fctx = ctx[wfn]
>> -        data = repo.wwritedata(wfn, fctx.data())
>> -        if 'l' in fctx.flags():
>> -            wopener.symlink(data, wfn)
>> -        else:
>> -            wopener.write(wfn, data)
>> -            if 'x' in fctx.flags():
>> -                util.setflags(dest, False, True)
>> -        if node is None:
>> +
>> +        # TODO: Use filesystem routines to duplicate the relevant parts
>> of the
>> +        #       working directory instead of this (archive doesn't work
>> for
>> +        #       wctx).  This will allow any subrepo type and largefiles
>> to work
>>
> It looks like util.copyfile should be able to do this.
>
>> +        wopener = scmutil.opener(base)
>> +        ctx = repo[node]
>> +        for fn in sorted(files):
>> +            wfn = util.pconvert(fn)
>> +            if wfn not in ctx:
>> +                # File doesn't exist; could be a bogus modify
>> +                continue
>>
> I've been wondering about this: what exactly does this bogus modify mean?
> Is this still something that could happen, or a leftover from the past?
> Could we ignore it and just copy all the relevant files without worrying
> about this?

In a previous life, this comment mentioned 'new file after a merge?', and  
originated here:

changeset:   3330:49966b5ab16f
parent:      3322:a1aad25ccc3e
user:        Benoit Boissinot <benoit.boissinot at ens-lyon.org>
date:        Wed Oct 11 16:35:09 2006 +0200
summary:     fix traceback of extdiff after a merge

I assume this is referencing an added (but not yet committed) file but I  
don't understand why that would be a problem.

But I wonder if util.copyfile() is a no-go because the data is currently  
written out through repo.wwritedata() (which archive also does).   
copyfile() obviously doesn't.  I know nothing about the filtering scheme  
it does.

>> +            ui.note('  %s\n' % wfn)
>> +            dest = os.path.join(base, wfn)
>> +            fctx = ctx[wfn]
>> +            data = repo.wwritedata(wfn, fctx.data())
>> +            if 'l' in fctx.flags():
>> +                wopener.symlink(data, wfn)
>> +            else:
>> +                wopener.write(wfn, data)
>> +                if 'x' in fctx.flags():
>> +                    util.setflags(dest, False, True)
>>
> The checking of the flags will no longer be necessary if we can use
> something like copyfile.
>