[PATCH stable] largefiles: handle merges between normal files and largefiles (issue3084)

Tue Dec 13 17:17:27 CST 2011

"Na'Tosha Bard" <natosha at unity3d.com> writes:

> 2011/12/13 Martin Geisler <mg at lazybytes.net>
>
>> Matt Mackall <mpm at selenic.com> writes:
>>
>> > Do largefiles never go through filemerge?
>>
>> No, not really. They do run run through filemerge.filemerge, but it
>> only offers users this prompt:
>>
>>  largefile %s has a merge conflict
>>  keep (l)ocal or take (o)ther?
>>
>> Seems a bit limited to me, to be honest, since users will probably
>> need to abort the choose blindly at this point and then dump the two
>> versions by hand.
>
> I think if we printed something like:
>
> keep (l)ocal or take (o)ther
> (other is newer)
>
> It would prevent a huge number of cases where the user must abort and
> find out which one is the one he wants. Most use cases with a binary
> file, if you want the one that is newer -- it's a newer version of the
> library, or an updated PNG image, or whatever.

Yeah, but they are in some sense "concurrent" since they were both made
in paralle from the same ancestor version. So neither is newer than the
other. But I guess you knew that and just want to compare commit dates?
That might help, but it wont tell the user much.

When I was in Copenhagen, you explained that you use largefiles to store
compiled libraries that are used for the compilation of your product.
That may be typical, but it was my impression that largefiles was (also)
meant to be used by people that work "actively" on the large files and
so treat them like real source artifacts for their product. Then it
doesn't make sense to just pick the newest image -- there has been some
communication problem if two artists have worked on the same image in
parallel and they have probably both had some intent with their commits.

Based on that, I think that we'll at least need something like
internal:dump here. It writes the two versions of the file to the
working copy so that the user can inspect them. Ideally, we would get
the normal merge machinery to work like normal so that people could
setup their own merge tools, e.g., a tool that picks the newer file.

This is kind of separate from what this patch does: right now, merges
where a file changes largefile-status are *broken*. You do the merge and
then 'hg status' aborts afterwards. That should be fixed. Making the
merge behave more like a normal merge could be a second step.

>> >> This patch fixes this by extending the manifest merge to deal with
>> >> these kinds of conflicts. If there is a normal file 'foo' in the
>> >> working copy, and the other parent brings in a '.hglf/foo' file,
>> >> then the user will be prompted to keep the normal file or the
>> >> largefile. Likewise for the symmetric case where a normal file is
>> >> brought in via the second parent. The prompt looks like this:
>> >
>> > Seems to me we should just always promote files to largefiles on
>> > merge. Or, apply the existing 'add' thresholds/patterns to decide.
>>
>> Yeah, I also wanted to do this at some point, but Na'Tosha told me
>> she was fine with the simpler solution of just prompting so I went
>> with that first. I'll have to look at things again to figure out
>> if/how an upgrade patch could look.
>
> As commented before, I don't think relying on the 'add'
> thresholds/patterns is a good idea at all. My opinion as a largefiles
> user is that letting the human decide is best, and that automatically
> upgrading it to a largefile is second best.
>
>
>> >> The status and diff output looks peculiar after a merge where the
>> >> type of a file changed. If a normal file 'foo' was changed to a
>> >> largefile, then we get:
>> >>
>> >>   $ hg status
>> >>   M foo
>> >>   R foo
>> >
>> > Eep. That's seriously ugly.
>>
>> Yes, agreed :) After looking at the largefiles code I'm afraid I find
>> the whole concept rather ugly. Basically all commands need wrapping
>> and adapting to make '.hglf/x' and 'x' be the same file. It feels
>> brittle and confusing. More papering over could of course hide the
>> output above, but it is in some way what you would expect when you
>> have these two files for every largefile.
>
> I don't find largefiles confusing, but brittle (with some sharp edges)
> is a good way to describe it :-)

I meant that the code is confusing. Having to wrap all commands to
carefully make them see and not see the largefiles is weird. Things like
directory renames are broken because of this -- the merge code does not
"see" the largefiles, so if you have

  dir/normal
     /large

and you move dir/normal to other-dir/normal, then Mercurial will think
that you've renamed the whole directory since dir is now empty. That can
of course also be fixed, but it's just an example of major parts of the
code now must deal with largefiles.

Greg, Benjamin, et al: did you give any thought to using the
encode/decode filters (or something similar) to handle this instead?
That is, "decoding" a SHA-1 into the file content when writing the
working copy, and "encoding" the file content back to a SHA-1 value when
commiting to the repo?

> I think conceptually, I would expect to see:
>
> $ hg status
> M foo
>
> Becuase *conceptually*, foo is the same file, whether it is a
> largefile, or a regular file. What the best thing to actually show in
> hg status is a bit hard to say.

I think that's the right thing to show as well.

-- 
Martin Geisler

Mercurial links: http://mercurial.ch/
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 197 bytes
Desc: not available
URL: <http://selenic.com/pipermail/mercurial-devel/attachments/20111214/975907be/attachment.pgp>