Migrating from Clearcase to Mercurial

Martin Geisler mg at aragost.com
Fri Dec 16 07:23:57 CST 2011


Simon King <simon at simonking.org.uk> writes:

Hi Simon,

> For the branching behaviour, I am heeding the warning on
> http://mercurial.selenic.com/wiki/StandardBranching and not using
> named branches for every change.

That's good -- while we also have caches for these, many tools will
assume that they can present all branches in a single drop-down menu.

> Instead, I am going to create a new server-side clone whenever a
> developer wants to start a new piece of work. He will push his changes
> to that clone where they can be reviewed. Once the review is complete
> (and as long as the clone is fully merged up with the main
> repository), the server will log the outgoing changesets, then push
> them from the branch repo to the main repo. The branch repo will
> probably then be deleted to save space on the server.

Depending on how busy the main repo is, the feature branch wont stay
fully merged up for long. This means that the server cannot put it to
the main repo. Therefore, it's normally done the opposite way: somebody
takes care of integrating branches that are reviewed. This person will
then do the final (tiny) merge before pushing to the main repo.

> I'm less confident about how to deal with the tagging. From a
> technical point of view, it's not nearly as important to tag every
> merged branch, because the changeset ID is a perfectly good unique
> identifier. But socially, I don't think we can do without those
> incrementing build IDs; people are too used to referring to builds by
> their number, and understanding that build A is more recent than build
> B simply because it has a higher number. (We could store build IDs
> outside of mercurial, but then developers can't use them with commands
> like 'hg merge' and 'hg update')

They could use the build IDs if an extension would "inject" them into
the lookup chain. Extensions like mq already add extra, temporary,
identifiers to existing changesets.

The extension would query a database for the build IDs. This could be
convenient if you need to attach more information to the ID and you
already maintain that information in another system.

> Firstly, are we going to start seeing performance problems if we have
> a few thousand tags in a repository? If so, are the performance
> problems only caused by having thousands of tags at a head? As I
> understand it, Mercurial examines the .hgtags file for each head in
> the repo, so if we purge tags that are no longer interesting from
> every head, will the performance be the same as if they had never
> existed?

We have a cache in place for the .hgtags files. This means that we
normally don't have to consult all the heads.

Also, the .hgtags file is only read from topological heads, not branch
heads. The number of topological heads is normally quite small compared
to the number of named branches in total.

> Secondly, we will actually be creating these tags through our web
> interface, which means it'll be the server running "hg tag". I think
> this means that I need a working copy on the server. I could keep it
> updated to default/tip on every push, but this seems a little wasteful
> of disk space, requires an extra "update" step on each push, and so
> on. I was wondering if instead I could have a named branch called
> "tags" which exists solely for tagging.

You could do that, or you could write a custom extension for this. It's
quite easy to add a changeset from an extension. An example is this
extension I wrote to reply history with better rename information:

  https://bitbucket.org/aragost/fixrenames/src/ff8429d9bf4f/fixrenames.py#cl-142

> So, apologies for the rambling email, but I wanted to give some
> background about why I'm doing things this way. I'm really looking for
> feedback on the tagging questions; will we have performance problems
> with thousands of tags, and is there anything wrong with having a
> named branch just for .hgtags?

Greg Ward wrote the tag caching logic because he ended up with ~108,000
tags after a CVS conversion and 'hg tags' took 7 seconds to run:

  http://markmail.org/message/ngu4wzp25mgxryy3

I did not find a mail or commit message where he gives the times after
the patches. Another user reports in Issue548 that 'hg parents' went
from 4 sec to 0.3 sec. Based on that, it seems that we can handle lots
of tags now.

If you want to know this for sure, then you could make a script that
generates a repository with, say, 50,000 tags.

-- 
Martin Geisler

aragost Trifork
Professional Mercurial support
http://mercurial.aragost.com/kick-start/


More information about the Mercurial mailing list