[PATCH 1 of 3] convert: cvsps.py - code to generate changesets from a CVS repository

Matt Mackall mpm at selenic.com
Thu Jun 12 00:32:10 CDT 2008


On Sun, 2008-06-08 at 15:54 +0100, Frank Kingswood wrote:
> # HG changeset patch
> # User Frank Kingswood <frank at kingswood-consulting.co.uk>
> # Date 1212936296 -3600
> # Node ID 06383eec1f435fbb895bb8b219f7cae6adb9bfcf
> # Parent  ab798a37b8461d314d9531d08198c54e194bea78
> convert: cvsps.py - code to generate changesets from a CVS repository
> 
> diff -r ab798a37b846 -r 06383eec1f43 hgext/convert/cvsps.py
> --- /dev/null	Thu Jan 01 00:00:00 1970 +0000
> +++ b/hgext/convert/cvsps.py	Sun Jun 08 15:44:56 2008 +0100
> @@ -0,0 +1,525 @@
> +#!/usr/bin/env python
> +#
> +# Mercurial built-in replacement for cvsps.
> +#
> +# Copyright 2008, Frank Kingswood <frank at kingswood-consulting.co.uk>
> +#
> +# This software may be used and distributed according to the terms
> +# of the GNU General Public License, incorporated herein by reference.
> +
> +import os
> +import re
> +import sys
> +import cPickle as pickle
> +from mercurial import util
> +from mercurial.i18n import _
> +
> +def listsort(list, key):
> +    "helper to sort by key in Python 2.3"
> +    try:
> +        list.sort(key=key)
> +    except TypeError:
> +        list.sort(lambda l, r:cmp(key(l), key(r)))

FYI, that actually ends up being fairly slow on old Python as key gets
called 2 * n * log(n) times or so. It's better to do a so-called
Schwarzian transform:

l2 = [(key(x), x) for x in list] # decorate
l2.sort()                        # sort
return [v[1] for v in l2]        # undecorate

That has no Python function calls in the inner loop, so ends up being
much faster.

> +class cvsps_log_entry(object):
> +    '''Class cvsps_log_entry has the following attributes:
> +        .Author    - author name as CVS knows it
> +        .Branch    - name of branch this revision is on
> +        .Branches  - revision tuple of branches starting at this revision
> +        .Comment   - commit message
> +        .Date      - the commit date as a (time, tz) tuple
> +        .Dead      - true if file revision is dead
> +        .File      - Name of file
> +        .Lines     - a tuple (+lines, -lines) or None
> +        .Parent    - Previous revision of this entry
> +        .RCS       - name of file as returned from CVS
> +        .Revision  - revision number as tuple
> +        .Tags      - list of tags on the file
> +    '''

We prefer not to use names_with_underbars or CapitalizedNames. Even for
classes or members.

> +    def __init__(self, **entries):
> +        self.__dict__.update(entries)
> +
> +class cvsps_log_error(Exception):
> +    pass
> +
> +def cvsps_create_log(ui, directory=None, root=None, rlog=True, cache=None):
> +    '''Collect the CVS rlog'''

The cvsps_ prefix here is redundant as we're already in cvsps.py. If
there are methods you don't want to be used externally, prefix them with
_, but otherwise, just use a nice short name that's uniq in the scope.

> +
> +    # reusing strings typically saves about 40% of memory

Very interesting.

> +    _scache = {}
> +    def scache(s):
> +        "return a shared version of a string"
> +        try:
> +            return _scache[s]
> +        except:
> +            _scache[s] = s
> +        return s

Try/except turns out to be fairly slow. It's faster and simpler to do:

if s not in _scache:
    _scache[s] = s
return _scache[s]

But in this case, we can get by with:

return _scache.setdefault(s, s)

> +    if not root:
> +        root = os.environ.get('CVSROOT', None)

None is redundant here.

> +    # read log cache if one exists
> +    oldlog = []
> +    date = None
> +
> +    if cache:
> +        cachedir = os.path.expanduser('~/.hg.cvsps')
> +        if not os.path.exists(cachedir):
> +            os.mkdir(cachedir)
> +        cachefile = (root or "").split(":")+[directory, "cache"]

But if you'd put "" up where None is you wouldn't need it down here.

> +        cachefile = ['-'.join(re.findall(r'\w+', s)) for s in cachefile if s]
> +        cachefile = os.path.join(cachedir, '.'.join(cachefile))

No idea what's happening there.

> +        if state == 0:

Please give a brief description of each state.

> +# EOF cvsps.py

Dead weight.

This all looks pretty good and I'm inclined to take it, but I'd still
like to see some naming and whitespace fixups. In particular, you've
still got lots of things like a = x+y. 

-- 
Mathematics is the supreme nostalgia of our time.



More information about the Mercurial-devel mailing list