Collecting data about discovery for testing new prototypes

Peter Arrenbrecht peter.arrenbrecht at gmail.com
Mon Jan 17 01:11:39 CST 2011


Augie,

On Mon, Jan 17, 2011 at 5:30 AM, Augie Fackler <durin42 at gmail.com> wrote:
> I'd *love* to provide this kind of data (I think I even offered to look into it at a sprint), but I'll need to do a couple of things:
>
> 1) Completely rework the extension's logging to go to someplace that's not a regular file. Probably a cell in bigtable or something that I can aggregate out later.
> 2) Check with our legal folks and make sure this is data I can collect and share.
>
> I don't expect any hurdles on either front, but I'll talk to people on Tuesday about it (Monday's a US holiday) and try to get back later in the week. I've cc'ed my work email so I don't drop this when I get to the office.

Thanks!

> As far as the debugindex data, we don't store revlogs[0]. Would it be sufficient to associate the negotiation data with a link to the public repo so you could clone and examine the DAG that way?

Yes, of course. Any way you can provide us with sufficient data that
we can figure out the full graph and the heads on both ends at the
time of a negotiation will do.

> Also: It's *possible* that if we want to make some assumptions I could figure out a way to coerce some amount of historical data out of the system. Should I investigate that on the appropriate technical/political/legal fronts?

The more data, the better, I guess. But I'd not spend too much effort on it.

Thanks again,
-parren

>
> Augie
>
> [0]: Happy to chat about that and/or provide the Google I/O talk link for the curious, although it's changed slightly since the talk.
>
> On Jan 15, 2011, at 9:45 AM, Peter Arrenbrecht wrote:
>>
>> Guys,
>>
>> We (tonfa and parren) would like to gather some real-life data about
>> typical push/pull scenarios from a couple of busy Mercurial servers.
>> All we need is the topological information. The base for this is the
>> output of:
>>
>> $ hg debugindex .hg/store/00changelog.i
>>   rev    offset  length   base linkrev nodeid       p1           p2
>>     0         0     305      0       0 9117c6561b0b 000000000000 000000000000
>>     1       305     152      1       1 273ce12ad8f1 9117c6561b0b 000000000000
>>     2       457     119      2       2 ecf3fd948051 273ce12ad8f1 000000000000
>> ...
>>
>> So even for proprietary repos, there would be no data leak. But we're
>> happy with data from only OS projects too, of course.
>>
>> And then we need the constellation of heads before/after pulls and
>> pushes. This we would get from a small extension that logs this
>> information. It is attached and I ask you to run it on your servers
>> for a while (after review!).
>>
>> It logs the data to `".hg/discovery-%s.log" % os.getpid()` so we
>> should not be introducing concurrency problems. A sample of the date
>> it writes is (from the test; the first column is
>> `str(datetime.now())`):
>>
>>  $ cat test/.hg/discovery-*.log
>>  [^;]*;cgss;177f92b773850b59254aa5e923436f921b55483b;055a42cdd88768532f9cf79daa407fc8d138de9b
>> (re)
>>  [^;]*;unb;055a42cdd88768532f9cf79daa407fc8d138de9b;75cbdffecadb121afac9edeea9fc6c3089cd2c4d
>> 055a42cdd88768532f9cf79daa407fc8d138de9b (re)
>>  [^;]*;cgss;177f92b773850b59254aa5e923436f921b55483b
>> 493dc096441243d1b99ee11f7c0257f2752531b2;75cbdffecadb121afac9edeea9fc6c3089cd2c4d
>> 055a42cdd88768532f9cf79daa407fc8d138de9b (re)
>>  [^;]*;unb;75cbdffecadb121afac9edeea9fc6c3089cd2c4d
>> 055a42cdd88768532f9cf79daa407fc8d138de9b;7523912c6e49654a8064a3cd5dbe3a7325252700
>> 75cbdffecadb121afac9edeea9fc6c3089cd2c4d
>> 055a42cdd88768532f9cf79daa407fc8d138de9b (re)
>>
>> If you are not directly reponsible for operating your respective
>> servers, could you please forward this request? Also, if you know
>> other interesting servers, do not hesitate to invite them, too.
>>
>> Thanks!
>> -parren
>> <discoverylogger.py><test-discoverylogger.t>
>
>
>


More information about the Mercurial-devel mailing list