[PATCH] convert: make convert from cvs recognise branch points correctly [revised, 3]

Mon Apr 20 03:57:16 CDT 2009

Patrick Mézard wrote:
> Given my little understanding of CVS, I cannot come up with correctness issues with this. Nothing prevents it from being pushed, except given the amount of CVS fixes going in these days, I'd like to understand as much as possible. Following annotations are of 3 types:
> - Performance questions which can be addressed later
> - Coding style nitpicking I can fix myself
> - Pure CVS questions for my personal enlightment
> 
> At this point, I do not expect another version of this patch, just a couple of answers if possible.

Of course. I've snipped most of the coding style nitpicks.

>> diff --git a/hgext/convert/cvsps.py b/hgext/convert/cvsps.py
>> --- a/hgext/convert/cvsps.py
>> +++ b/hgext/convert/cvsps.py
>> @@ -35,6 +35,7 @@
>>          .tags      - list of tags on the file
>>          .synthetic - is this a synthetic "file ... added on ..." revision?
>>          .mergepoint- the branch that has been merged from (if present in rlog output)
>> +        .branchesto- list of branches that occur at this revision
>>      '''
>>      def __init__(self, **entries):
>>          self.__dict__.update(entries)
>> @@ -378,6 +379,13 @@
>>              # clean up the results and save in the log.
>>              store = False
>>              e.tags = util.sort([scache(x) for x in tags.get(e.revision, [])])
>> +            e.branchesto = []
>> +            for tag in branchmap:
>> +                tag_parts = branchmap[tag].split('.')
>> +                if len(tag_parts) > 2 and tag_parts[-2] == '0' and (
>> +                    e.revision == tuple([int(x) for x in tag_parts[:-2]])):
>> +                    e.branchesto.append(tag)
>> +
> 
> * How can a file revision have more than one branchesto ?

If several branches start at the same revision. I.e. imagine a.txt,v is
tagged with tag FOO and you then do:

cvs rtag -b -r FOO BRANCH1
cvs rtag -b -r FOO BRANCH2
cvs rtag -b -r FOO BRANCH3

then branchesto will contain BRANCH1,BRANCH2,BRANCH3.

> * I don't really know how large branchmap can become with large repositories, I am a bit worried about looping and parsing tags again and again for every file revision, but I have no numbers on this. Could it be worth keeping the parsed branch revisions somewhere (in branchmap or anything else)?

Each branchesto will contain the branches that occur in the CVS
repository as well. There is really no more compact way to store this as
the branch points are per file. I.e. a.txt may branch to BRANCH1 at
version 1.2, but b.txt may branch to BRANCH1 at version 1.32.

>>              e.comment = scache('\n'.join(e.comment))
>>  
>>              revn = len(e.revision)
>> @@ -582,6 +590,8 @@
>>  
>>      versions = {}    # changeset index where we saw any particular file version
>>      branches = {}    # changeset index where we saw a branch
>> +    # variable to only output warning for missing branchesto once
>> +    has_logentry_without_branchesto = False
> 
> We do not like underscores and long variable names very much, but you already know that.

Yes, I was made aware of that fact after I submitted this patch. :o)

>> +                    # necessary as the first commit on a branch can be
>> +                    # broken into multiple bites
>> +                    earliest_commit = min([l.date for l in log if l.branch == c.branch])
> 
> I don't really get this part. Can you elaborate? I would expect the changeset generation process to handle this already.

A branch creation can be broken into multiple commits in CVS, if e.g. a
lot of files are touched and someone happens to have a lock on a file
that you're trying to branch. In this case, the branch creation will be
partially succeeded and you can resume to tag the rest of the revisions.
Hence we need the earliest possible branch creation timestamp to
determine what other revisions it is relevant to see whether we have
branched from them.

>> +
>> +createvar1()
>> +{
>> +    echo Creating 1447 var 1
>> +    createmaster
>> +    cd work
>> +    echo a > a.txt
>> +    echo b > b.txt
>> +    cvscall -Q add *.txt
>> +    cvsci -m "initial"
>> +    echo "b fix" >> b.txt
>> +    cvsci -m "b fix"
>> +    echo "a fix" >> a.txt
>> +    cvsci -m "a fix"
>> +    cvscall -q tag FOO
>> +    echo "b fix 2" >> b.txt
>> +    cvsci -m "b fix 2"
>> +    cvscall -q up -r 1.2 *.txt
> 
> Why use -r 1.2 instead of the FOO tag? Stickyness issue?

To be honest, I don't recall. Would seem that it should be possible to
use FOO here just as well - probably it was due to some timing issues I
was experiencing with CVS, where Greg and I figured out the solution to
this was to insert timely "sleep 1" calls.

-- 
Kind regards,
  Henrik