[PATCH stable] largefiles: handle merges between normal files and largefiles (issue3084)

Na'Tosha Bard natosha at unity3d.com
Tue Dec 13 17:36:52 CST 2011


2011/12/14 Martin Geisler <mg at lazybytes.net>

> "Na'Tosha Bard" <natosha at unity3d.com> writes:
>
> > 2011/12/13 Martin Geisler <mg at lazybytes.net>
> >
> >> Matt Mackall <mpm at selenic.com> writes:
> >>
> >> > Do largefiles never go through filemerge?
> >>
> >> No, not really. They do run run through filemerge.filemerge, but it
> >> only offers users this prompt:
> >>
> >>  largefile %s has a merge conflict
> >>  keep (l)ocal or take (o)ther?
> >>
> >> Seems a bit limited to me, to be honest, since users will probably
> >> need to abort the choose blindly at this point and then dump the two
> >> versions by hand.
> >
> > I think if we printed something like:
> >
> > keep (l)ocal or take (o)ther
> > (other is newer)
> >
> > It would prevent a huge number of cases where the user must abort and
> > find out which one is the one he wants. Most use cases with a binary
> > file, if you want the one that is newer -- it's a newer version of the
> > library, or an updated PNG image, or whatever.
>
> Yeah, but they are in some sense "concurrent" since they were both made
> in paralle from the same ancestor version. So neither is newer than the
> other. But I guess you knew that and just want to compare commit dates?
>

Yes, that's what I meant.


> That might help, but it wont tell the user much.
>

Hmm, I disagree, but maybe we are using largefiles in a very different way
than most people.


> When I was in Copenhagen, you explained that you use largefiles to store
> compiled libraries that are used for the compilation of your product.
>

Among other things; we also use them for image files, and various other
types of binaries that are large and can't be merged.


> That may be typical, but it was my impression that largefiles was (also)
> meant to be used by people that work "actively" on the large files and
> so treat them like real source artifacts for their product. Then it
> doesn't make sense to just pick the newest image -- there has been some
> communication problem if two artists have worked on the same image in
> parallel and they have probably both had some intent with their commits.
>

Sure, but in the end it comes down to picking one or the other.  The lack
of communication and divergent work on something that can't be merged has
already happened by the time we've gotten to this point.


> Based on that, I think that we'll at least need something like
> internal:dump here. It writes the two versions of the file to the
> working copy so that the user can inspect them. Ideally, we would get
> the normal merge machinery to work like normal so that people could
> setup their own merge tools, e.g., a tool that picks the newer file.
>

Yes, it would certainly be useful to dump them into the working copy so the
user can inspect them, but providing a display of interesting commit
information about them is also relevant.  If it is a library, then looking
at the actual file isn't going to be very useful, and the user would
probably need to know who committed each one, and which one is newest to
make a decision (or talk to someone to make a decision).  If it's an image
file, sure, looking at the two files is a perfectly valid way to pick the
correct one.

I still don't understand the point of making it so people can set up their
own merge tools, when it comes down to just picking one or the other.  I
don't see why we need an external set of merge tools for that.  It just
seems like it will complicate things for the user.


> This is kind of separate from what this patch does: right now, merges
> where a file changes largefile-status are *broken*. You do the merge and
> then 'hg status' aborts afterwards. That should be fixed. Making the
> merge behave more like a normal merge could be a second step.
>

Completely agree; this patch seems to fix the broken merge behavior, which
is currently a a bad bug that must be fixed.  I strongly support having the
other changes be a next step, becuase this terrible bug needs fixed first
and foremost.


> >> >> This patch fixes this by extending the manifest merge to deal with
> >> >> these kinds of conflicts. If there is a normal file 'foo' in the
> >> >> working copy, and the other parent brings in a '.hglf/foo' file,
> >> >> then the user will be prompted to keep the normal file or the
> >> >> largefile. Likewise for the symmetric case where a normal file is
> >> >> brought in via the second parent. The prompt looks like this:
> >> >
> >> > Seems to me we should just always promote files to largefiles on
> >> > merge. Or, apply the existing 'add' thresholds/patterns to decide.
> >>
> >> Yeah, I also wanted to do this at some point, but Na'Tosha told me
> >> she was fine with the simpler solution of just prompting so I went
> >> with that first. I'll have to look at things again to figure out
> >> if/how an upgrade patch could look.
> >
> > As commented before, I don't think relying on the 'add'
> > thresholds/patterns is a good idea at all. My opinion as a largefiles
> > user is that letting the human decide is best, and that automatically
> > upgrading it to a largefile is second best.
> >
> >
> >> >> The status and diff output looks peculiar after a merge where the
> >> >> type of a file changed. If a normal file 'foo' was changed to a
> >> >> largefile, then we get:
> >> >>
> >> >>   $ hg status
> >> >>   M foo
> >> >>   R foo
> >> >
> >> > Eep. That's seriously ugly.
> >>
> >> Yes, agreed :) After looking at the largefiles code I'm afraid I find
> >> the whole concept rather ugly. Basically all commands need wrapping
> >> and adapting to make '.hglf/x' and 'x' be the same file. It feels
> >> brittle and confusing. More papering over could of course hide the
> >> output above, but it is in some way what you would expect when you
> >> have these two files for every largefile.
> >
> > I don't find largefiles confusing, but brittle (with some sharp edges)
> > is a good way to describe it :-)
>
> I meant that the code is confusing. Having to wrap all commands to
> carefully make them see and not see the largefiles is weird. Things like
> directory renames are broken because of this -- the merge code does not
> "see" the largefiles, so if you have
>
>  dir/normal
>     /large
>
> and you move dir/normal to other-dir/normal, then Mercurial will think
> that you've renamed the whole directory since dir is now empty. That can
> of course also be fixed, but it's just an example of major parts of the
> code now must deal with largefiles.
>

Yes, since I've been working on largefiles, I know we've fixed a huge
number of cases like this, and there are probably still many more.  We fix
them as they are reported.


> Greg, Benjamin, et al: did you give any thought to using the
> encode/decode filters (or something similar) to handle this instead?
> That is, "decoding" a SHA-1 into the file content when writing the
> working copy, and "encoding" the file content back to a SHA-1 value when
> commiting to the repo?
>

Honestly, I think this would be a heck of a lot nicer, but I still wonder
if it's possible to avoid having all of these overrides, even with a
re-write of the architecture to what you describe.  Unless there is some
way we can make the encoding/decoding happen in such a way that it is
completely transparent to Mercurial -- which you or others much more
experienced with Mercurial's core can comment on better than me.

> I think conceptually, I would expect to see:
> >
> > $ hg status
> > M foo
> >
> > Becuase *conceptually*, foo is the same file, whether it is a
> > largefile, or a regular file. What the best thing to actually show in
> > hg status is a bit hard to say.
>
> I think that's the right thing to show as well.


Overall, I think this patch should go in, with the only change being to not
show the "removed" file in "hg status".  It's a huge improvement on the
current state of affairs with merging, fixes a bug that haunts users
forever once they've hit it, and improves the user experience when merging
in the file-was-promoted-to-largefile-on-one-side use case.  Other
improvements can come as later patches.

Cheers,
Na'Tosha

-- 
*Na'Tosha Bard*
Build & Infrastructure Developer | Unity Technologies

*E-Mail:* natosha at unity3d.com
*Skype:* natosha.bard
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://selenic.com/pipermail/mercurial-devel/attachments/20111214/f25c67f5/attachment.html>


More information about the Mercurial-devel mailing list