Unifying sparse and narrow "profiles"

Thu Mar 30 17:10:21 EDT 2017

On Thu, Mar 30, 2017 at 1:15 PM, Durham Goode <durham at fb.com> wrote:
> On 3/29/17 7:47 PM, Augie Fackler wrote:
>>
>> (+martinvonz, durham)
>>
>>> On Mar 29, 2017, at 1:05 PM, Gregory Szorc <gregory.szorc at gmail.com>
>>> wrote:
>>>
>>> Mozilla has added tens of thousands of files to the Firefox repo in
>>> the past few months and we have plans to add tens of thousands more
>>> shortly. Working directory update times (especially in automation,
>>> which has to do fresh checkouts with a somewhat high frequency
>>> since we rely on ephemeral compute instances) were borderline
>>> tolerable before. With the addition of tens of thousands of new
>>> files, working directory updates are starting to put noticeable
>>> strain on systems.
>>>
>>> Mozilla can make due with sparse checkouts - we don't yet have so
>>> much repo data that we need narrow clone, although narrow would be
>>> useful.
>>>
>>> Facebook's sparse checkout extension has existed for years and my
>>> understanding is it gets the job done. When I asked at the sprint
>>> why sparse isn't part of the core distribution, someone mentioned
>>> it is because sparse and narrow have different, competing concepts
>>> for defining a sparse/narrow profile and these will need to be
>>> reconciled before either can be accepted into core.
>>
>>
>> I think we’ve mostly stopped caring about profiles for our users -
>> Martin, can you confirm that?
>
>
> While Google might not need sparse profiles, I think the feature is critical
> for making sparse usable in a multi-user environment. I think the lack of it
> is one of the reasons Git's sparse implementation is not more widely used
> (along with its unpolished UI).
>
>>
>>> Is there a timeline for unifying the profiles and adding sparse to
>>> the core distribution?
>>
>>
>> This is mostly on me, I think, to clean up narrow and make it so that
>> it can satisfy the varying modes of operation:
>> 0) narrow working directory only
>> 1) narrow working directory and history, but no eliding of irrelevant
>> changes
>> 2) “full narrow”, including elision of irrelevant changes
>>
>> What we’re using at Google is mode 2, but that’s also the most
>> server-expensive and the most likely to have bugs. It shouldn’t be
>> /too/ much work to reconcile our implementation with sparse, add
>> profiles support, and default to mode 0 or 1.

I agree about this.

I'd also like to point out that mode 1 and 2 with treemanifests places
unusual requirements on matching, namely that the client and server
will agree on which directories are needed (the match.visitdir()
method), given a set of patterns. For example, if I include
"glob:foo/*/bar", the client will not know what which subdirectories
in foo/ have a bar (sub)subdirectory, so the server will need to send
all of them (i.e. the server can not skip subdirectories of foo/ that
don't have any 'bar's). It also means that the match.visitdir() has to
match between client and server, which places BC constraints on adding
optimizations to match.visitdir(). So for mode 1 and 2, we should
probably make the server restrictive about what kinds of patterns it
allows, at least when using treemanifests.

>>
>> Durham, do you have opinions on this? Is it a fair assumption on my
>> part that you’d rather we maintain this horror than you?
>
>
> If we make some basic assumptions about the UI (i.e. 'hg sparse' is a good
> command name, and it will have 'include', 'exclude', 'enable-profile', and
> 'disable-profile' flags; and .hg/sparse is a good spot to store the client
> config), I'd almost say let's ship the current sparse extension in hgext/
> and satisfy mode 0. Then once the narrow-sparse unification has occurred
> (assuming it implements the same hg sparse UI for mode 0 operations), we can
> delete the old sparse code.

I'm not ready to make those assumption (and I'm not sure you were
saying we should). I think we need to discuss UI for the three modes
Augie mentioned and make sure that they feel consistent.

>
> Heck, it may even be beneficial to have the 'hg sparse' command control the
> working copy oriented sparseness, and a separate command to control the
> storage sparseness. So introducing them separately may make even more sense.

Possibly.

>
> That way we can have sparse in 4.3, and there's no pressure to prioritize
> the narrow refactor until we feel it's necessary. That opens room for
> potential future refactors in narrowhg as Facebook moves towards a lazy
> changelog strategy which could influence the public facing narrowness
> strategy.

For sparse and/or narrow in core, I think we'll need to pass matchers
around a bit more. We probably will also need to work on matcher
composition, so we can create a matcher with the user's patterns ANDed
together with the sparse/narrow matcher. Do you have that problem
already in the sparse extension? Once we have both sparse and narrow,
it seems likely we will need to compose those two matchers too.

>
>>
>> (Also, are there any docs I should read about your sparse stuff and
>> profiles?)
>
>
> `hg sparse --help` is about all we have right now but it's pretty good. Just
> pay attention to the last sentence in each argument description, since that
> tells you when the sparse change is applied (i.e. applied when the command
> is run, or applied when the commit is made)