Unifying sparse and narrow "profiles"

Durham Goode durham at fb.com
Thu Mar 30 16:15:28 EDT 2017


On 3/29/17 7:47 PM, Augie Fackler wrote:
> (+martinvonz, durham)
>
>> On Mar 29, 2017, at 1:05 PM, Gregory Szorc <gregory.szorc at gmail.com> wrote:
>>
>> Mozilla has added tens of thousands of files to the Firefox repo in
>> the past few months and we have plans to add tens of thousands more
>> shortly. Working directory update times (especially in automation,
>> which has to do fresh checkouts with a somewhat high frequency
>> since we rely on ephemeral compute instances) were borderline
>> tolerable before. With the addition of tens of thousands of new
>> files, working directory updates are starting to put noticeable
>> strain on systems.
>>
>> Mozilla can make due with sparse checkouts - we don't yet have so
>> much repo data that we need narrow clone, although narrow would be
>> useful.
>>
>> Facebook's sparse checkout extension has existed for years and my
>> understanding is it gets the job done. When I asked at the sprint
>> why sparse isn't part of the core distribution, someone mentioned
>> it is because sparse and narrow have different, competing concepts
>> for defining a sparse/narrow profile and these will need to be
>> reconciled before either can be accepted into core.
>
> I think we’ve mostly stopped caring about profiles for our users -
> Martin, can you confirm that?

While Google might not need sparse profiles, I think the feature is 
critical for making sparse usable in a multi-user environment. I think 
the lack of it is one of the reasons Git's sparse implementation is not 
more widely used (along with its unpolished UI).

>
>> Is there a timeline for unifying the profiles and adding sparse to
>> the core distribution?
>
> This is mostly on me, I think, to clean up narrow and make it so that
> it can satisfy the varying modes of operation:
> 0) narrow working directory only
> 1) narrow working directory and history, but no eliding of irrelevant
> changes
> 2) “full narrow”, including elision of irrelevant changes
>
> What we’re using at Google is mode 2, but that’s also the most
> server-expensive and the most likely to have bugs. It shouldn’t be
> /too/ much work to reconcile our implementation with sparse, add
> profiles support, and default to mode 0 or 1.
>
> Durham, do you have opinions on this? Is it a fair assumption on my
> part that you’d rather we maintain this horror than you?

If we make some basic assumptions about the UI (i.e. 'hg sparse' is a 
good command name, and it will have 'include', 'exclude', 
'enable-profile', and 'disable-profile' flags; and .hg/sparse is a good 
spot to store the client config), I'd almost say let's ship the current 
sparse extension in hgext/ and satisfy mode 0. Then once the 
narrow-sparse unification has occurred (assuming it implements the same 
hg sparse UI for mode 0 operations), we can delete the old sparse code.

Heck, it may even be beneficial to have the 'hg sparse' command control 
the working copy oriented sparseness, and a separate command to control 
the storage sparseness. So introducing them separately may make even 
more sense.

That way we can have sparse in 4.3, and there's no pressure to 
prioritize the narrow refactor until we feel it's necessary. That opens 
room for potential future refactors in narrowhg as Facebook moves 
towards a lazy changelog strategy which could influence the public 
facing narrowness strategy.

>
> (Also, are there any docs I should read about your sparse stuff and
> profiles?)

`hg sparse --help` is about all we have right now but it's pretty good. 
Just pay attention to the last sentence in each argument description, 
since that tells you when the sparse change is applied (i.e. applied 
when the command is run, or applied when the commit is made)


More information about the Mercurial-devel mailing list