[PATCH] convert: Perforce source for conversion to Mercurial

Mon Mar 2 09:15:17 CST 2009

Here comes a nit-picking review. All opinions are my own and in no way 
authoritative, but my intent is to guide towards following the style 
used in the existing Mercurial code base.

Frank Kingswood wrote:
> # HG changeset patch
> # User Frank Kingswood <frank at kingswood-consulting.co.uk>
> # Date 1235997678 0
> # Node ID 852debd905983055d209265dc8433f204da14fc5
> # Parent  5b010dae99c3cc518364ca4f8422762342285a97
> convert: Perforce source for conversion to Mercurial
> Take four:
>  - fixed some language in messages
>  - now stores all P4 change IDs in extras field of log
>  - added a (simple) testcase
>  - make testcase executable and verify
>   

These meta-comments are probably not relevant in the version that ends 
up getting applied.

> diff --git a/hgext/convert/__init__.py b/hgext/convert/__init__.py
> --- a/hgext/convert/__init__.py
> +++ b/hgext/convert/__init__.py
> @@ -25,6 +25,7 @@
>      - Monotone [mtn]
>      - GNU Arch [gnuarch]
>      - Bazaar [bzr]
> +    - Perforce [p4]
>  
>      Accepted destination formats [identifiers]:
>      - Mercurial [hg

This makes test-convert fail (trivially) - obviously the full test suite 
hasn't been run.

> diff --git a/hgext/convert/p4.py b/hgext/convert/p4.py
> new file mode 100644
> --- /dev/null
> +++ b/hgext/convert/p4.py
> @@ -0,0 +1,177 @@
> +#
> +# Perforce source for convert extension.
> +#
> +# Copyright 2009, Frank Kingswood <frank at kingswood-consulting.co.uk>
> +#
> +# This software may be used and distributed according to the terms
> +# of the GNU General Public License, incorporated herein by reference.
> +#
> +
> +from mercurial import util
> +from mercurial.i18n import _
> +
> +from common import commit, converter_source, checktool
> +import marshal
> +
> +def loaditer(f):
> +    try:
> +        d = marshal.load(f)
> +        while d:
> +            yield d
> +            d = marshal.load(f)
>   

FWIW I would prefer to avoid repeating the lines that actually do 
anything, and instead loop while True and break if not d.

> +    except EOFError, e:
> +        pass
>   

e is unused

> +
> +class p4_source(converter_source):
> +    def __init__(self, ui, path, rev=None):
> +        super(p4_source, self).__init__(ui, path, rev=rev)
> +
> +        checktool('p4')
> +
> +        self.p4changes = {}
> +        self.heads = {}
> +        self.changeset = {}
> +        self.files = {}
> +        self.tags = {}
> +        self.lastbranch = {}
> +        self.parent = {}
> +        self.encoding = "latin_1"
> +        self.depotname={}           # mapping from local name to depot name
>   

spaces around "=", please

> +
> +        self._parse(ui, path)
> +
> +    def _parse_view(self, path):
> +        "Read changes affecting the path"
> +        cmd = "p4 -G changes -s submitted %s" % path
> +        stdout = util.popen(cmd)
> +        for f in loaditer(stdout):
>   

What is "f"? Perhaps a more descriptive name would be more self-documenting?

> +            c = f.get("change", None)
> +            if c:
> +                self.p4changes[c] = True
> +
> +    def _parse(self, ui, path):
> +        "Prepare list of P4 filenames and revisions to import"
> +        ui.status(_('reading p4 views\n'))
> +
> +        # read client spec or view
> +        if "/" in path:
> +            self._parse_view(path)
> +            if path.startswith("//") and path.endswith("/..."):
> +                views = {path[:-3]:""}
> +            else:
> +                views = {"//":""}
>   

space after ":"

> +        else:
> +            cmd = "p4 -G client -o %s" % path
> +            clientspec = marshal.load(util.popen(cmd))
> +            
> +            views = {}
> +            for x in clientspec:
>   

What is "x"?

> +                if x.startswith("View"):
> +                    sview, cview = clientspec[x].split()
> +                    self._parse_view(sview)
> +                    if sview.endswith("...") and cview.endswith("..."):
> +                        sview = sview[:-3]
> +                        cview = cview[:-3]
> +                    cview = cview[2:]
> +                    cview = cview[cview.find("/") + 1:]
> +                    views[sview] = cview
> +
> +        # list of changes that affect our source files
> +        self.p4changes = [x for x in self.p4changes]
>   

self.p4changes.keys() ?

> +        self.p4changes.sort(key = int)
>   

pep8 says no spaces

> +
> +        # list with depot pathnames, longest first
> +        vieworder = [x for x in views]
>   

views.keys() ?

> +        vieworder.sort(key = lambda x:-len(x))
>   

no space around "=" - but perhaps after ":" ...

> +
> +        # handle revision limiting
> +        startrev = self.ui.config('convert', 'p4.startrev', default=0)
> +        self.p4changes = [x for x in self.p4changes 
> +                          if ((not startrev or int(x)>=int(startrev)) and 
> +                              (not self.rev or int(x)<=int(self.rev)))]
>   

Perhaps spaces around comparision operators ... YMMV ;-)

> +
> +        # now read the full changelists to get the list of file revisions
> +        ui.status(_('collecting p4 changelists\n'))
> +        lastid = None
> +        for id in self.p4changes:
>   

id is built-in - better avoid using that

> +            cmd = "p4 -G describe %s" % id
> +            stdout = util.popen(cmd)
> +            f = marshal.load(stdout)
>   

What is f?

> +            del stdout
>   

Who cares and why?

> +
> +            shortdesc = desc = self.recode(f["desc"])
> +            i = shortdesc.find("\n")
> +            if i != -1:
> +                shortdesc = shortdesc[:i]
>   

desc = self.recode(f["desc"])
shortdesc = desc.split("\n", 1)[0]

> +            t = '%s %s' % (f["change"], repr(shortdesc)[1:-1])
> +            ui.status(util.ellipsis(t, 80) + '\n')
> +
> +            if lastid:
> +                parents = [lastid]
> +            else:
> +                parents = []
> +            
> +            date = (int(f["time"]), 0)     # timezone not set
> +            c = commit(author=self.recode(f["user"]), date=util.datestr(date),
> +                        parents=parents, desc=desc, branch='', extra={"p4":id})
>   

space after ":"

> +
> +            files = []
> +            i = 0
> +            while "depotFile%d" % i in f and "rev%d" % i:
>   

IMHO this looks suspicious. It might be correct and clever, but the 
intention isn't obvious to me.

> +                oldname = f["depotFile%d" % i]
> +                filename = None
> +                for v in vieworder:
> +                    if oldname.startswith(v):
> +                        filename = views[v] + oldname[len(v):]
> +                        break
> +                if filename:
> +                    files.append((filename, f["rev%d" % i]))
> +                    self.depotname[filename] = oldname
> +                i += 1
> +            self.changeset[id] = c
> +            self.files[id] = files
> +            lastid = id
> +        
> +        if lastid:
> +            self.heads = [lastid]
> +
> +    def getheads(self):
> +        return self.heads
>   

Trivial getter ... some loves them, some hates them ...

> +
> +    def getfile(self, file, rev):
>   

"file" is builtin - better avoid using that

> +        cmd = "p4 -G print %s#%s" % (self.depotname[file], rev)
> +        stdout = util.popen(cmd)
> +
> +        mode = None
> +        data = ""
> +
> +        for f in loaditer(stdout):
>   

What is "f"?

> +            if f["code"] == "stat":
> +                if "+x" in f["type"]:
> +                    mode = "x"
> +                else:
> +                    mode = ""
> +            elif f["code"] == "text":
> +                data += f["data"]
> +
> +        if mode is None:
> +            raise IOError()
>   

Is it really an IOError?

> +
> +        self.modecache[(file, rev)] = mode
> +        return data
> +
> +    def getmode(self, file, rev):
>   

"file" is built-in

> +        return self.modecache[(file, rev)]
> +
> +    def getchanges(self, rev):
> +        self.modecache = {}
>   

It is cleaner to "declare"/initialize this instance variable in __init__

> +        return self.files[rev], {}
> +
> +    def getcommit(self, rev):
> +        return self.changeset[rev]
> +
> +    def gettags(self):
> +        return self.tags
> +
>   

Trivial getters ... YMMV

> +    def getchangedfiles(self, rev, i):
> +        return util.sort([x[0] for x in self.files[rev]])

> diff --git a/tests/test-convert-p4 b/tests/test-convert-p4
> new file mode 100755
> --- /dev/null
> +++ b/tests/test-convert-p4
> @@ -0,0 +1,62 @@
> +#!/bin/sh
> +
> +"$TESTDIR/hghave" p4 || exit 80
> +
> +echo "[extensions]" >> $HGRCPATH
> +echo "convert = " >> $HGRCPATH
> +
> +echo % create p4 depot
> +export P4ROOT=$PWD/depot
> +export P4AUDIT=$P4ROOT/audit
> +export P4JOURNAL=$P4ROOT/journal
> +export P4LOG=$P4ROOT/log
> +export P4PORT=localhost:16661
> +export P4DEBUG=1
> +
> +echo % start the p4 server
> +[ ! -d $P4ROOT ] && mkdir $P4ROOT
> +p4d -f -J off >$P4ROOT/stdout 2>$P4ROOT/stderr &
> +trap "echo % stop the p4 server ; p4 admin stop" EXIT
> +
> +# wait for the server to initialize
> +sleep 1
> +
>   

On my machine 1 second isn't enough and the test fails. With 3 seconds 
it works. Perhaps it would be better to loop until testing shows that is 
has been started.

> +echo % populate the depot

Perhaps you could add some evil characters (such as spaces) in names of 
depotpaths, directories and files. Perforce is often used on windows 
where developers often use for example spaces in names, and it looks 
like the convert could have some issues with that.

With increased sleep in the test I can confirm that the test runs 
without failure on Fedora. It has been tested/developed on Windows?

FWIW I don't have any p4 repos available I can try to convert.

/Mads
ex happy Perforce user/admin