Line ending translation extension

Martin Geisler mg at lazybytes.net
Fri Sep 11 11:17:12 CDT 2009


"Stephen J. Turnbull" <stephen at xemacs.org> writes:

> Martin Geisler writes:
>
>  > The way I see it, mentioning a file in .hgeol means that you no
>  > longer want to manually control the EOL symbols used. Mercurial
>  > will take complete control over it.
>
> I basically agree with this (but I have little hope it will put "Paid
> in Full" to the platform-dependent newline problem). Also, the way I
> interpret "no manual control" is that any newline function (as defined
> in TR#14 on unicode.org) is treated as a newline, and canonicalized to
> the internal representation on commit, subject to the following
> caveat:

I think you're talking about this report:

  http://unicode.org/reports/tr14/

It looks big and somewhat complicated... don't expect a first version of
the eol extension to implement it :-)

>  > So even if you change EOL format in a file in your working
>  > directory, the file wont appear modified to Mercurial.
>
> I don't really like that idea.  I think Mercurial should take note of
> any difference from the expected convention and warn about it, or
> maybe even abort the commit.

I guess that would be nice -- let's try to make it detect such
situations. First I would like to see all the update/commit/clone
mechanics working, then we can look into it.

>  > When you commit, newlines of all kinds are converted to the format
>  > specified in .hgeol:
>  > 
>  >   windows: convert LF and CRLF to CRLF
>  >   unix:    convert LF and CRLF to LF
>  >   native:  convert LF and CRLF to repository native format
>
> Be careful.  At *commit* time, AIUI anthing that looks like a newline
> is converted to repo-internal format, which probably should be LF.  At
> *checkout* time, repo-internal format is converted to the specified
> format, where "native" depends on the platform where the working tree
> is being checked out (or updated).

I agree.

> I think that having repo-internal depend on the platform where the
> repo was init'ed is a short cut to madness....

Yes, I agree -- the repository will default to unix (LF) format as its
native format (it's actually defaulting to CRLF at the moment). The
working copy will default to os.linesep as its default format.

>  > By the way, if you change the declared EOL format of a file by
>  > editing .hgeol, you have to call 'hg debugrebuildstate' to make
>  > Mercurial double-check the filter settings and doing a commit.
>
> I don't understand this.  The repo's only knowledge about the newlines
> in the working tree is contained in .hgeol.  Change that, force a
> checkout of the file, and voila!  No?

Correct -- except that the 'force a checkout of the file' step would
involve something like

  hg update 0
  hg update tip

where you effectively remove all files and ask Mercurial to check them
out again. I'm not an expert in the finer details of this, but Mercurial
maintains a 'dirstate' file which records stuff like file modifications
times, the list of scheduled adds and so. This is used when looking for
changed files.

The debugrebuildstate command is an internal command which will let
Mercurial double-check all files and update the dirstate to match. The
point is that a file can become "modified" without having changed on
disk if an encode filter is changed. Switching from

  [encode]
  **.txt = tr a-z A-Z

to

  [encode]
  **.txt = tr A-Z a-z

will make a file like containing "Hello World" change from "HELLO WORLD"
to "hello world" in the repository, despite being unchanged in the
working directory.

We should of course make a better command for this, something which will
look at all the patterns and only examine files matched by a pattern.
It's just good to know that the debugrebuildstate command can do the job
for now.

-- 
Martin Geisler

VIFF (Virtual Ideal Functionality Framework) brings easy and efficient
SMPC (Secure Multiparty Computation) to Python. See: http://viff.dk/.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 196 bytes
Desc: not available
Url : http://selenic.com/pipermail/mercurial-devel/attachments/20090911/b53f1c9d/attachment.pgp 


More information about the Mercurial-devel mailing list