[PATCH RFC] convert: add support for recode in filemaps
Matt Mackall
mpm at selenic.com
Mon Dec 19 12:44:52 CST 2011
On Mon, 2011-12-19 at 19:21 +0100, Martin Geisler wrote:
> # HG changeset patch
> # User Martin Geisler <mg at lazybytes.net>
> # Date 1324318815 -3600
> # Node ID 5bf6234ff33f997486c85210c6d7cec58f1fa524
> # Parent 4841035f37b6df368682460d8a7cbf10276b8d1b
> convert: add support for recode in filemaps
>
> This command is used in a filemap like
>
> recode OLD NEW
>
> and will make convert recode all file names from OLD to NEW.
>
> This patch is not 100% done -- there could be a warning if recode is
> specified twice, for example. Also, the recoding is done before
> renames are taken into account. It should probably be done after since
> the filemap seems to work on source path names only.
This probably needs a way to specify an error-handling strategy. Right
now it'll work great going to and from UTF-8 but elsewhere it'll be very
hit or miss. As I read it, the default mode is: fail with a traceback.
That default should probably be: replace with escaping of some sort. For
instance, latin1 café to ascii caf%e9.
> I made the patch to better support the case discussed here:
> http://serverfault.com/a/342446/14103
>
> diff --git a/hgext/convert/filemap.py b/hgext/convert/filemap.py
> --- a/hgext/convert/filemap.py
> +++ b/hgext/convert/filemap.py
> @@ -4,7 +4,7 @@
> # This software may be used and distributed according to the terms of the
> # GNU General Public License version 2 or any later version.
>
> -import shlex
> +import shlex, codecs
> from mercurial.i18n import _
> from mercurial import util
> from common import SKIPREV, converter_source
> @@ -26,6 +26,7 @@
> self.include = {}
> self.exclude = {}
> self.rename = {}
> + self.recode = None
> if path:
> if self.parse(path):
> raise util.Abort(_('errors in filemap'))
> @@ -68,6 +69,14 @@
> self.rename[src] = dest
> elif cmd == 'source':
> errs += self.parse(lex.get_token())
> + elif cmd == 'recode':
> + self.recode = (lex.get_token(), lex.get_token())
> + try:
> + codecs.getdecoder(self.recode[0])
> + codecs.getencoder(self.recode[1])
> + except LookupError, e:
> + self.ui.warn('%s:%d: %s\n' % (lex.infile, lex.lineno, e))
> + errs += 1
> else:
> self.ui.warn(_('%s:%d: unknown directive %r\n') %
> (lex.infile, lex.lineno, cmd))
> @@ -84,6 +93,9 @@
> return '', name, ''
>
> def __call__(self, name):
> + if self.recode:
> + name = name.decode(self.recode[0]).encode(self.recode[1])
> +
> if self.include:
> inc = self.lookup(name, self.include)[0]
> else:
> @@ -106,7 +118,7 @@
> return name
>
> def active(self):
> - return bool(self.include or self.exclude or self.rename)
> + return bool(self.include or self.exclude or self.rename or self.recode)
>
> # This class does two additional things compared to a regular source:
> #
> diff --git a/tests/test-convert-filemap.t b/tests/test-convert-filemap.t
> --- a/tests/test-convert-filemap.t
> +++ b/tests/test-convert-filemap.t
> @@ -375,3 +375,31 @@
> |
> o 0 "addb" files: b
>
> +
> +Test recode command:
> +
> + $ hg init latin-1
> + $ cd latin-1
> + >>> open("p\xe6rer.txt", "w").write("pears\n")
> + $ hg commit -A -m Latin-1
> + adding p\xe6rer.txt (esc)
> + $ cd ..
> + $ echo "recode latin-1 utf-8" > recode
> + $ hg convert latin-1 utf-8 --filemap recode
> + initializing destination utf-8 repository
> + scanning source...
> + sorting...
> + converting...
> + 0 Latin-1
> + $ hg -R utf-8 manifest -r tip
> + p\xc3\xa6rer.txt (esc)
> +
> +Errors:
> +
> + $ echo "recode foo utf-8" >> recode
> + $ echo "recode latin-1 bar" >> recode
> + $ hg convert latin-1 utf-8 --filemap recode
> + recode:3: unknown encoding: foo
> + recode:4: unknown encoding: bar
> + abort: errors in filemap
> + [255]
> _______________________________________________
> Mercurial-devel mailing list
> Mercurial-devel at selenic.com
> http://selenic.com/mailman/listinfo/mercurial-devel
--
Mathematics is the supreme nostalgia of our time.
More information about the Mercurial-devel
mailing list