speed up relink script

Fri Mar 23 15:52:46 CDT 2007

On Friday, 23 March 2007 at 15:28, Matt Mackall wrote:
> On Fri, Mar 23, 2007 at 12:04:14PM -0700, Brendan Cully wrote:
> > It's better if collect skips .d files, and you don't do any reading at
> > all (eg the md5) until after the prune phase. I doubt the md5 is
> > needed - directly comparing the indexes is probably enough. I don't
> > think the head-comparing stuff is about speed - it's about possibly
> > relinking even more files.
> 
> For comparing two (or more!) repos, you really don't want to seek back
> and forth between them comparing single files at a time. Hence, storing
> hashes of all the indices in one pass.

The prune phase eliminates any files that are already linked (and
currently also those where the size doesn't match). For repos that
have much in common (which is the domain of this script), I would
expect this pruning to be a significant reduction.

The current script also does the stat on all the .i files in the
source directory, then stats the corresponding files in the target, so
there shouldn't be too much cross-repo seeking in this phase.

Doing an md5 collect on .i _after_ the prune might be worth it though.