[PATCH] convert: make convert from cvs recognise branch points correctly [revised, 3]

Sun Apr 19 17:18:13 CDT 2009

Henrik Stuart a écrit :
> # HG changeset patch
> # User Henrik Stuart <henrik.stuart at edlund.dk>
> # Date 1239706830 -7200
> # Node ID f8fd162954633e67b27605b2541eb46e155d4780
> # Parent  aece3c9e62f1a685e4cebd3ecf0b22e1c8101ae1
> convert: make convert from cvs recognise branch points correctly
> 
> This patch fixes issue 1447 by using the symbolic names for branches
> in the rlog output to determine the branch point for each file and
> then finding the latest possible branch point in the given changesets
> such that all the file revisions match and the date is earlier than
> the branch commit. For commits on non-branches, the current
> functionality is maintained.
> 
> Co-contributor: Greg Ward <greg-hg at gerg.ca>

Excellent work! 

Given my little understanding of CVS, I cannot come up with correctness issues with this. Nothing prevents it from being pushed, except given the amount of CVS fixes going in these days, I'd like to understand as much as possible. Following annotations are of 3 types:
- Performance questions which can be addressed later
- Coding style nitpicking I can fix myself
- Pure CVS questions for my personal enlightment

At this point, I do not expect another version of this patch, just a couple of answers if possible.

> 
> diff --git a/hgext/convert/cvsps.py b/hgext/convert/cvsps.py
> --- a/hgext/convert/cvsps.py
> +++ b/hgext/convert/cvsps.py
> @@ -35,6 +35,7 @@
>          .tags      - list of tags on the file
>          .synthetic - is this a synthetic "file ... added on ..." revision?
>          .mergepoint- the branch that has been merged from (if present in rlog output)
> +        .branchesto- list of branches that occur at this revision
>      '''
>      def __init__(self, **entries):
>          self.__dict__.update(entries)
> @@ -378,6 +379,13 @@
>              # clean up the results and save in the log.
>              store = False
>              e.tags = util.sort([scache(x) for x in tags.get(e.revision, [])])
> +            e.branchesto = []
> +            for tag in branchmap:
> +                tag_parts = branchmap[tag].split('.')
> +                if len(tag_parts) > 2 and tag_parts[-2] == '0' and (
> +                    e.revision == tuple([int(x) for x in tag_parts[:-2]])):
> +                    e.branchesto.append(tag)
> +

* How can a file revision have more than one branchesto ?

* I don't really know how large branchmap can become with large repositories, I am a bit worried about looping and parsing tags again and again for every file revision, but I have no numbers on this. Could it be worth keeping the parsed branch revisions somewhere (in branchmap or anything else)?

>              e.comment = scache('\n'.join(e.comment))
>  
>              revn = len(e.revision)
> @@ -582,6 +590,8 @@
>  
>      versions = {}    # changeset index where we saw any particular file version
>      branches = {}    # changeset index where we saw a branch
> +    # variable to only output warning for missing branchesto once
> +    has_logentry_without_branchesto = False

We do not like underscores and long variable names very much, but you already know that.

>      n = len(changesets)
>      i = 0
>      while i<n:
> @@ -599,8 +609,46 @@
>  
>          c.parents = []
>          if p is not None:
> +            parent_idx = p

Underscore.

>              p = changesets[p]
>  
> +            if p is not None:
> +                if c.branch != p.branch:

I would fold both conditions to reduce indentation

> +                    # necessary as the first commit on a branch can be
> +                    # broken into multiple bites
> +                    earliest_commit = min([l.date for l in log if l.branch == c.branch])

I don't really get this part. Can you elaborate? I would expect the changeset generation process to handle this already.

> +
> +                    q = parent_idx
> +                    # already in p at the latest
> +                    seen_branched_files = util.set([e.file for e in c.entries])
> +                    while q + 1 < i: # only need to search up to current

I would rewrite it with a for loop and an xrange

> +                        qcs = changesets[q + 1]
> +
> +                        if qcs.date >= earliest_commit:
> +                            break
> +
> +                        if [e.file for e in qcs.entries if 
> +                                e.file in seen_branched_files]:
> +                            break # we cannot progress past this place
> +
> +                        branchesto = getattr(e, 'branchesto', None)
> +                        if branchesto == None and not has_logentry_without_branchesto:
> +                            has_logentry_without_branchesto = True
> +                            ui.warn(_("CVS cache has been compiled without "
> +                                "correct branch information.\n"
> +                                "If you have branches in your CVS repository "
> +                                "it is strongly adviced to\n"
> +                                "delete the cache and start over.\n"))

I don't think linebreak are useful here

> +                        if branchesto == None:
> +                            branchesto = []
> +                        seen_branched_files |= util.set([
> +                            e.file for e in qcs.entries if 
> +                                c.branch in branchesto])

I would rewrite the condition into something like:

if branchesto is None:
    if not has_logentry_without_branchesto:
        has_logentry_without_branchesto = True
        ui.warn(...)
elif branchesto:
    [update seen_branches_files]

> +                                
> +                        q += 1
> +
> +                    p = changesets[q]
> +
>              # Ensure no changeset has a synthetic changeset as a parent.
>              while p.synthetic:
>                  assert len(p.parents) <= 1, \
> diff --git a/tests/test-convert-cvs-branchpoints b/tests/test-convert-cvs-branchpoints
> new file mode 100755
> --- /dev/null
> +++ b/tests/test-convert-cvs-branchpoints
> @@ -0,0 +1,166 @@

[snip]

> +
> +createvar1()
> +{
> +    echo Creating 1447 var 1
> +    createmaster
> +    cd work
> +    echo a > a.txt
> +    echo b > b.txt
> +    cvscall -Q add *.txt
> +    cvsci -m "initial"
> +    echo "b fix" >> b.txt
> +    cvsci -m "b fix"
> +    echo "a fix" >> a.txt
> +    cvsci -m "a fix"
> +    cvscall -q tag FOO
> +    echo "b fix 2" >> b.txt
> +    cvsci -m "b fix 2"
> +    cvscall -q up -r 1.2 *.txt

Why use -r 1.2 instead of the FOO tag? Stickyness issue?

> +    cvscall -q rtag -b -r FOO BRANCH work
> +    cvscall -q up -r BRANCH
> +    echo "b branch fix" >> b.txt
> +    cvsci -m "b branch fix"
> +    hgcvsps
> +    cd ..
> +}

--
Patrick Mézard