New discovery prototype

Sat Jun 12 03:20:27 CDT 2010

Benoit

When you pulled from me yesterday, old discovery took 20 queries:

  $ hg debugdiscovery -R up-crew dev-crew --verbose --pull --old --debug
  ...
  20 total queries

With the new prototype, we need only 2:

  $ hg debugdiscovery -R up-crew dev-crew --verbose --pull --debug
  taking quick initial sample
  query 1
  still undecided: 11339, sample size was: 30
  sampling from both directions
  query 2
  still undecided: 3, sample size was: 3
  needed 2 roundtrips

And it seems to me old discovery actually returns more nodes than
necessary in "common".

I have added code that uses internal access to remote if remote is
actually a localrepo. The --pull flag above disables this.

The "quick initial sample" is an attempt at tuning the case where
either party is a subset of the other, or there are just a few changes
near the heads. The quick sample just lets your sampler walk down from
the heads until the sample is full, or it reaches the bottom. It does
not necessarily span the entire graph, and it does not fill the sample
up by a random sample. This is either much less CPU intensive on the
client, or, for smaller repos where it hits the bottom, it leads to a
smaller initial sample. So here we have only 30 nodes instead of the
configured initial sample size of 100. This leads me to wonder if we
might not want your "factor" to increase a little more slowly than by
doubling.

Also, I dropped my attempt at optimizing the end-game in the server,
as that, too, can be a bit too CPU intensive (at least in the current
implementation; see the FIXME in checksample()).

I currently do not pass the server's heads at the start of the
conversation to and fro again, and I also do not prune the DAG looked
at by the server when you "pull -r". Not passing the heads should not
lead to wrong results. If someone pushes while discovery is going on,
we might actually get a slightly better result, as some nodes might
now be yes instead of no, so we actually won't send them too. Not
reducing the server's DAG for "pull -r" means we don't have to walk
the DAG on the server and can simply use its nodemap. On the other
hand, it might lead us to explore more areas of the DAG than
necessary. I have not fully thought this through yet.

As I said, I still have a worrying case where the new discovery seems
to return a wrong result on my old linux kernel changelog clone. Shall
investigate.

Finally Jan, Martin's boss, came up with an idea to maybe prune the
set of sample nodes we take per cut throught the graph by looking at
how much each of these nodes has the potential to reduce the undecided
set for both a yes and a no answer. This is probably not realistic as
it would be very CPU intensive, but it might be food for thought
leading to other new ideas.

-parren