Collecting data about discovery for testing new prototypes

Augie Fackler durin42 at gmail.com
Sun Jan 16 22:30:39 CST 2011


I'd *love* to provide this kind of data (I think I even offered to look into it at a sprint), but I'll need to do a couple of things:

1) Completely rework the extension's logging to go to someplace that's not a regular file. Probably a cell in bigtable or something that I can aggregate out later.
2) Check with our legal folks and make sure this is data I can collect and share.

I don't expect any hurdles on either front, but I'll talk to people on Tuesday about it (Monday's a US holiday) and try to get back later in the week. I've cc'ed my work email so I don't drop this when I get to the office.

As far as the debugindex data, we don't store revlogs[0]. Would it be sufficient to associate the negotiation data with a link to the public repo so you could clone and examine the DAG that way?

Also: It's *possible* that if we want to make some assumptions I could figure out a way to coerce some amount of historical data out of the system. Should I investigate that on the appropriate technical/political/legal fronts?

Augie

[0]: Happy to chat about that and/or provide the Google I/O talk link for the curious, although it's changed slightly since the talk.

On Jan 15, 2011, at 9:45 AM, Peter Arrenbrecht wrote:
> 
> Guys,
> 
> We (tonfa and parren) would like to gather some real-life data about
> typical push/pull scenarios from a couple of busy Mercurial servers.
> All we need is the topological information. The base for this is the
> output of:
> 
> $ hg debugindex .hg/store/00changelog.i
>   rev    offset  length   base linkrev nodeid       p1           p2
>     0         0     305      0       0 9117c6561b0b 000000000000 000000000000
>     1       305     152      1       1 273ce12ad8f1 9117c6561b0b 000000000000
>     2       457     119      2       2 ecf3fd948051 273ce12ad8f1 000000000000
> ...
> 
> So even for proprietary repos, there would be no data leak. But we're
> happy with data from only OS projects too, of course.
> 
> And then we need the constellation of heads before/after pulls and
> pushes. This we would get from a small extension that logs this
> information. It is attached and I ask you to run it on your servers
> for a while (after review!).
> 
> It logs the data to `".hg/discovery-%s.log" % os.getpid()` so we
> should not be introducing concurrency problems. A sample of the date
> it writes is (from the test; the first column is
> `str(datetime.now())`):
> 
>  $ cat test/.hg/discovery-*.log
>  [^;]*;cgss;177f92b773850b59254aa5e923436f921b55483b;055a42cdd88768532f9cf79daa407fc8d138de9b
> (re)
>  [^;]*;unb;055a42cdd88768532f9cf79daa407fc8d138de9b;75cbdffecadb121afac9edeea9fc6c3089cd2c4d
> 055a42cdd88768532f9cf79daa407fc8d138de9b (re)
>  [^;]*;cgss;177f92b773850b59254aa5e923436f921b55483b
> 493dc096441243d1b99ee11f7c0257f2752531b2;75cbdffecadb121afac9edeea9fc6c3089cd2c4d
> 055a42cdd88768532f9cf79daa407fc8d138de9b (re)
>  [^;]*;unb;75cbdffecadb121afac9edeea9fc6c3089cd2c4d
> 055a42cdd88768532f9cf79daa407fc8d138de9b;7523912c6e49654a8064a3cd5dbe3a7325252700
> 75cbdffecadb121afac9edeea9fc6c3089cd2c4d
> 055a42cdd88768532f9cf79daa407fc8d138de9b (re)
> 
> If you are not directly reponsible for operating your respective
> servers, could you please forward this request? Also, if you know
> other interesting servers, do not hesitate to invite them, too.
> 
> Thanks!
> -parren
> <discoverylogger.py><test-discoverylogger.t>




More information about the Mercurial-devel mailing list