/!\ This page is primarily intended for Mercurial's developers.

4.0 Sprint

/!\ Subscribe to this page so you don't miss updates!

1. Date and location

The sprint will be held at the Mozilla office in Paris, Oct 7-9. More info about the office can be found at https://wiki.mozilla.org/Paris.

Location point of contact: Gregory Szorc (indygreg) - gps@mozilla.com

If you need a formal invitation for visa purpose, contact the person above.

2. Arrival Logistics

The Mozilla Paris office is located at 16 Boulevard Montmarte. There is no signage to indicate it is a Mozilla office. Look for solid wood brown doors.

A security guard will be posted in the morning to grant entrance. During designated arrival times, the outside brown doors will be open and the security guard will be downstairs to guide people in. If the doors are closed/locked, use the keypad next to the doors to dial Mozilla and someone should buzz you in.

Designated arrival time on Friday is between 0900 and 1000. If you arrive after this time, you will likely need to call Mozilla from the keypad next to the door to be buzzed in.

Designated arrival time on Saturday and Sunday will be determined Friday.

3. Attendance

Everyone is welcome from core developer to aspiring contributor. Attending a Mercurial sprint is usually a good way to kickstart your contributions and you'll get a large amount of help available for 3 days.

Name

Coming from

Need funding

Hotel

In Town Dates

Notes

Aaron Kushner

SF

no

Ibis Paris Grands Boulevards Opera

Oct 7-10

Alain leufroy

Paris

no

local

Oct 7-9

Andras Belokosztolszki

London

no

Best Western Hotel Opera Drouot

Oct 6-10

Augie Fackler

PIT

no

The Chess Hotel

Oct 6-10

Aurélien Campéas

Paris

no

local

Oct 7-9

Christophe de Vienne

Paris

no

local

Oct 7

Cotizo Sima

London

no

Best Western Hotel Ronceray Opera

Oct 7-9

Denis Laxalde

France

no

TBD

Oct 7-8

Durham Goode

California

no

Westminster

Oct 6-9

Erik van Zijst

California

No

Best Western Hotel Ronceray Opera

Oct 4-11

Florent Aide

Paris

no

local

Oct 7

Gábor Stefanik

Budapest

no

Best Western Anjou Opera

Oct 6-10

Gregory Szorc

SF

no

The Chess Hotel

Oct 5-10

Jeremy Fitzhardinge

SF

no

Ibis Paris Grands Boulevards Opera

Oct 7-10

Jun Wu

London

no

Westminster Hotel

Oct 6-9

Katsunori Fujiwara

Japan

yes

Best Western Diva Opera

Oct 6-10

Kevin Bullock

Minnesota

yes

Best Western Hotel Opera Drouot

Oct 5-10

Kostia Balytskyi

London

no

Westminster

Oct 6-9

Kyle Lippincott

California

no

Villathena

Oct 3-11

Mads Kiilerich

Denmark, Unity

no

Best Western Hotel Ronceray Opéra

Martijn Pieters

London

no

The Westin Paris - Vendome

Oct 6-9

Mateusz Kwapich

London

no

Westminster Hotel

Oct 6-9

Mathias De Maré

Belgium

no

Kyriad Paris Gare du Nord

Oct 6-9

Philippe Pepiot

France

no

TBD

Oct 6-8/9

Pierre-Yves David

Paris

no

local

local

Piotr Listkiewicz

Cracow

yes

TBD

Oct 6-9

Pulkit Goyal

New Delhi

yes

TBD

Oct 6-10

Remi Chaintron

London

no

TBD

TBD

Rodrigo Damazio Bovendorp

California

no

Best Western Hotel Ronceray Opera

Oct 6-10

Ryan McElroy

London

No

Westminster Hotel

Oct 6-10

Sean Farley

California

No

Best Western Hotel Ronceray Opera

Oct 4-11

Siddharth Agarwal

SF

no

Westminster Hotel

Oct 6-9

Simon Farnsworth

London

No

Westminster Hotel

Oct 6-9

Stanislau Hlebik

London

no

Westminster Hotel

Oct 6-9

Wez Furlong

California

No

Westminster Hotel

Oct 6-9

Yuya Nishihara

Japan

maybe

Richmond Opéra

Oct 6-10

4. Travel tips

* If you need a smartphone to use while in Paris, http://www.insidr.paris/ is an option.

* Unless you spend you day in the subway, the "day ticket" is usually not worth it.

* There is a single ticket type for all travels in the downtown area "Ticket T". Buying them in "book" of 10 is much cheaper (25% cheaper)

* From the airport you need a "Paris - CDG" ticket. They are quite expensive (around 10€) because of airport tax

5. Sponsors

We have funds to pay flights and hotel for a few independent contributors.

Sponsoring Company:

Sponsor point of contact: Kevin Bullock < kbullock@ringworld.org >

6. Meals

Having Food delivered for Lunch is usually preferred as it help keeping the timing under control. Dinner is usually taken outside to help people cool off after a day of work.

(Don't forget vegetarian and vegan option)

We'll have lunch on site and dinner outside. Given the group size, we usually break up in smaller group for dinner.

idea place ideas:

7. Possible Topics

It worked well at the last sprint to devote some of the time to "unconference"-style sessions, proposed by attendees and scheduled into a grid of timeslots and rooms. Do we want to plan on that again? —KevinBullock

Important things we want to discuss:

7.1. Project infrastructure (Kevin Bullock)

I'll give a report on the current state of the project's infrastructure (server, buildbot, bugzilla, mailing lists, &c.), and walk people thru the config automation we've done via Ansible and Docker. Hoping to get some more people familiar with it so they can solve problems when they arise.

7.2. Transition

Did you knew Mercurial now had a Steering committee ? (and other question related to mpm/transition)

7.3. Topics (Sean Farley, Erik van Zijst, Pierre-Yves David anyone else)

Flesh out topics. Designing a natural and easy-to-use workflow: when should a changeset become public in an average user's workflow?

7.4. MergeDriver internals (Siddharth Agarwal, ...)

Talk (~1h) about internals of MergeDriverPlan + whatever work we end up doing next.

7.5. StreamClone and Batch wireproto enhancements (durin42, indygreg)

Seem related. Streamable and reorderable batch.

7.6. Evolution and relation topic (Kyle Lippincott, Pierre-Yves David)

It is probably worth chatting about evolution in general.

Kyle had some specific questions too:

Q: Is hg evolve --all discouraged? Should it be? :)

A: Yes, it is, for various temporary and fundamental reason:

Q: Frequently users at Google just run hg evolve --all instead of more focused evolves, and dislike that after hg evolve --all they are on the tip-most revision instead of the revision they started on (or potentially the successor of that revision). I'd like to discuss changing this movement behavior (via a flag and/or config option)

A: This is definitly something we would like to change. especially now that 'hg next'; have some evolution capability. However, more work on the evolution state file is needed. Anyone is welcome to send patch to make progress here.

Q: The subject is based on my (possibly flawed) recollection that the movement is happening because hg evolve --all; hg co .^^; hg amend; hg evolve --all; hg co .^; hg amend; ... is really not recommended.

A: Yes there is a potential marker explosion issue that we need to have a proper answer to.

7.7. Performance tracking suite (Philippe Pepiot)

Talk and demo about PerformanceTrackingSuitePlan

7.8. Virtual filesystems for source code (Kyle Lippincott, Durham Goode?)

Google has had this (for non-Mercurial repos) for a while and Facebook has just started working on something called Eden. It'd be great to discuss how this fits in Mercurial's overall ecosystem, whether anything should go into core and other plans going forward.

7.9. Reliability and performance in large Mercurial deployments (Rodrigo Damazio)

Many companies now have large Mercurial repositories, often hosted in a distributed server environment. I'd like to discuss implications and a few specifics (such as having exponentially-backed-off retries, request batch load-balancing and out-of-order batch replies on the client).

7.10. Code Review Tooling (Ryan McElroy)

A persistent issue for people wishing to provide a little bit of help to the community through code review (which might lead to a lot of help over time!) is the need for each person to develop their own tooling around an email-based workflow. What tools can we bring to bear to make "wading in" to the community more accessible (rather than being doused with a firehose).

Jun has a fork of the sup email client which has some nice features like Patchwork integration.

7.12. Bitmap Index for obsolete, phase, hidden changesets (Durham, Jun, Pierre-Yves)

Mainly to address the start-up time reading the obsstore, calculating various sets of obsoleted commits. It aims for O(1) testing if a rev is hidden or not, with O(1) start-up time. The bitmap index is better to be the source of truth so other code path can only do incremental update to it - better for performance.

7.13. Mercurial book progress (Mathias)

Anyone interested in the book is welcome to discuss progress and future plans

7.14. Revisiting Scale (Jeremy F)

Quick sketch of some server-side work we're planning to cope with Facebook scale.

7.15. Making clone+checkout as fast as possible (indygreg)

Things we need to do to go from fresh machine to having a working copy as fast as possible.

7.16. zstandard (indygreg)

I want to have a discussion about vendoring it and making it the default to replace zlib.

7.17. Fake parent / changeset splicing (indygreg)

Anyone interested in implementing FakeParentPlan? Is that proposal compatible with shallow clone and other core changes in the works?

7.18. Forward Blame / Find Next Changeset to touch/remove this line (indygreg)

Anyone want to hack on this at sprint?

7.19. Random experimental ideas

Some random ideas without POCs.

7.20. Mercurial on Pypy (Maciej Fijalkowski, Pierre-Yves David)

Some look at what was achieved with pypy so far and discussion about what we could do next.

Requires a VC with Maciej Fijalkowski, according to him, we could do that on Sunday.

7.21. Mercurial hosting ( Sean, Erik, Mads, Mathias)

Let's chat about the state of Kalithea, Bitbucket etc.

8. Sprint Notes

Notes taken on an etherpad: (original https://public.etherpad-mozilla.org/p/sprint-hg4.0-NOSPAMREMOVETHATLASTPART (drop the anti spam part))

&FRIDAY: 2016-10-07
Updates/quick questions:
All flags are now negatable
  ([default] update=update --check, then use `hg update --no-check` to turn off the default)
no current plans to make it possible to identify whether the user typed `hg revert -a` vs. `hg revert --all`, so that one could be disabled, but not the other.
More commands will be templatizable / accept -T (including output from rebase, etc.)
  https://www.mercurial-scm.org/wiki/GenericTemplatingPlan
Possibility: relative paths as a template component (also full absolute paths?)
  Yes https://bz.mercurial-scm.org/show_bug.cgi?id=5394
Terminal-sized output in templating system? Also requires more state into the templater, but possible
  yes https://bz.mercurial-scm.org/show_bug.cgi?id=5395
Most scaffolding for fixing record to not rewrite the working dir should be done (https://bz.mercurial-scm.org/show_bug.cgi?id=3591)
Discussion about documenting config options. Should extension config options be documented in `hg help config`? If so, man hgrc would differ. `hg help -e <ext>` is not obvious to find extension config options.
Likewise, discussions about documentation of keys and labels for the output.
  For labels, --color=debug is useful.

Session 1A: Shelve

Shelve discussion results:
- do not remove current shelve implementation
- add a new implementation for repos with obsolescense support, hidden behind the experimental config flag
- new shelve should work like this:
    - record which files are untracked/missing
    - addremove those files
    - run commit  to preserve these changes
    - obsolete this commit (possibly indicate in obsmarker metadata that this is a shelve-obsoletion)
    - update to commit's parent
- new unshelve should work like this:
    - if there are untracked changes in working copy:
        - record untracked/missing files
        - run addremove
        - commit
    - rebase (or just merge) original shelved commit on top of current working copy parent
    - if working copy changes were there, reset 2 commits, otherwise reset 1 commit
    
    Feature ideas:
        --missing (or missing by default like git stash?)
        --all (including ignored files)
        
        

Session 1B: FUSE
Overview from FB: Up in the air about whether to expose virtual .hg or not, mutations in overlay, 
Overview from Google: CitC/SrcFS/hg: 
Python fuse filesystem proof-of-concept from Mads: everything provided on demand, no performance numbers yet - available on bitbucket: https://bitbucket.org/kiilerix/hgfs
Discussion around "special" clients like Eclipse which will try to index the entire repo: fake responses depending on the process requesting the files?
(FB currently has GVFS: Git virtual filesystem, read-only, used for third-party dependencies)
FB is interested in hgfs if it performs well
Changes in core hg or important extensions that would help:
    - Google:

    - split out change logs and manifest revlogs (tree manifest)

    - dirstate is concerning because of size and frequency of changes (can it be just the edited files?)

    - narrowhg: narrowspec management

    - mercurial talks through a layer which abstracts access to the revlog on disk (revlog.py)

   - FB: separate serialization from user storage
   - Google vs FB manifest implementations are separate, but the format is identical
   - Google: hg update won't rewrite the unchanged files in the working dir (underlying FS handles it)

   
   We should merge these 2? Yup. Yours look more complete, so I'll move bits of mine in.
Ok

FUSE
----
- Facebook has 'Eden', a FUSE-based system (https://github.com/facebookexperimental/eden)
        - available as open source, but doesn't currently build
        - virtualize user checkouts (working copy)
                - only need to access/update the files you're working on (faster for update, ...)
                - variation: copy your workspace to a remote location automatically (magic :-))
                - currently under development
                - communication: Thrift interface to the server/direct communication to local Mercurial (depending on overhead)
        - other available system: GVFS (Git Virtual FileSystem) to access all of the revisions (loaded through NFS)
                - Would be interesting to replace this by a Mercurial FUSE filesystem
- Google
        - Originally 'sourcefs': readonly view of any point in time
        - Writable layer on top of sourcefs and piper: CitC (Coding in the Cloud?)
                - Available on any machine
                - Integrated with other systems like code review
        - Some users use a git interface, but limited in functionality
        - Mercurial
                - Currently used by dozens of users
                - Investigating use in CitC
                        - Some items are very expensive (for example revlog) due to appending
                        - Currently quite agnostic about Mercurial (Git support is also available, to some degree)
                        - Very early days
                        - Moving somewhat from narrowhg to 'the entire repo is visible and downloaded on demand'
        - Third party code: managed with 1 branch for thirdparty version, 1 branch for own customizations
- Unity
        - Proof of concept: Python FUSE filesystem for Mercurial
        - On-demand requests:
                - can be quite fast for some operations
                - much faster to switch between branches
        - Python is slow
        - Open source: on Bitbucket
- Do we need things in core Mercurial to support virtualized filesystems?
        - CitC:
                - change remotefilelog
                - use treemanifest
                - dirstate: if kept down to only the edited files, this should be doable
                - most concerning: how do you manage the narrow spec (with narrowhg)?
                        - what if you jump to different points in history?
                - changes in Mercurial core are not strictly necessary (extension points are doable)
                        - Mercurial could talk to a 'generic storage'
                        - replace .d by a directory with a file per revision
                - Better abstract API needed on revlog

                      - Facebook uses a blob store, would be interesting to use that API

        - Facebook

            - Wireproto changes could be shared

      - Mostly working now

      - Difficult to do operations in batch in core Mercurial (quite a few changes needed to go from sequential to batch operations)

      - Caching layer:

    - Send requests both to local client and speculative requests to a server

- (Didn't see this in the two sections above: would be nice to split up any for loop that identifies work to do + does it into two for loops: (1) identify work to do, (2) do that batch of work.  This makes it much nicer to handle things that want to, say, prefetch, for filesystems that have high latency for some operations).  Oh, it's three lines above this one, someone should merge that in.  I'll try to remember to do that when the topics topic isn't topicking.  --spectral

Session 2: Topics
--------------------
Presentation:
There is a tutorial in the topics repo similar to the demo that pierre-yves presented
"hg stack" shows changes in a topic, `hg topics --verbose` will show information about all of the topics, including the number of changesets, how far behind they are, etc.
hg update can use a topic name
hg rebase will use the topic's "tracking" branch as the default destination
when a change is pushed to a publishing repo (i.e. the changeset gets marked public) the topic is not shown in log.
topics appear to activate/deactivate similarly to bookmarks
topics are "light" branches - `hg update default` will go to the tip-most change on 'default' that is *not* on a topic
topics will push just fine to non-publishing server, even if they would create a "new head".  For publishing servers, it will still warn about creating multiple heads.
`hg stack` and `hg topics --verbose` will provide some warnings if you end up with a topic that has multiple heads.
global namespace for topic names
concern about topics / anonymous heads / named branches / bookmarks - we had three, now we have four concepts.  Probably will work to make topics a good replacement for bookmarks and then look at deprecating/replacing bookmarks, and probably warning on anonymous heads?
from Erik @ bitbucket: anonymous heads are confusing to a lot of their users
discussion about when to publish vs not publish -- settled on a "good idea" to be semi-publishing repos that publish named branches but still allow topic pushes without publishing (need to discuss more about how to build this and if it can become a default)
discussion about local-branch concept -- can there be a way to name sets of commits without exchanging that information?
^^^ This is what I had originally thought topics *was* --spectral ; I'm not sure that introducing a "light-weight named branch" concept for multiple users to collaborate on makes sense - just use named branches? :/ <-- the idea that Jun has is to allow people to name things locally -- maybe we can play with this in an extension before discussing more broadly  My understanding was that Jun was proposing topics-without-exchange, *not* mapping remote and local names for the same topic.


Session 3A: Network Protocol Enhancements
Designred changes:
    - large batches should be splittable / in parallel / out of order
     - in-stream errors
Transport needs to support streaming:
    - current SSH implementation doesn't support it
    - http may as well not, considering the minimum sizes used on the buffers:
        - 8KiB in Python read() method - use hg11 for "no-I/O"?
        - 64KB in chunkbuffer, would need to be flushed
        - 256KB buffer somwhere in Python (I think, I can't remember where it was though)
Motivation: large repos with distributed servers could parallelize requests to different servers
Customize just remotefilelog vs all batch requests?
    - Mozilla would benefit from having it for all batch requests
Would also support resuming after an error (retries)
For in-stream errors, bundle2 allows size==-1 to send something different mid-stream
Possibly create a batch2 command that would support these features?
Compression:
    - currently forced/implicit at transport level
    - works differently in ssh vs http
    - client should advertise methods it supports, server should reply with the best one
    - Standard vs custom HTTP headers to advertise compression:
         - Can standard ones handle weird compression formats like lz4?
    - should server advertise compressions as capabilities, or should client advertise?
    - bundle3 may support per-chunk compression method
    - HTTP proxy caching of responses: we could send cache control headers
       - some args make our responses not very cacheable (e.g. "common" in getbundle)
Continuing/retries:
    - currently, we abort the entire batch if one of the requests fails
    - our http client doesn't do pipelining (but does support keepalive)
    - if server sends bundles of 100 changes, we can retry individual getbundle requests

        but that might not really be necessary... if we can re-do discovery, that's fine.  It's 200k getfile batches that are concerning, and maybe not a big issue since if we get 150k of them, they end up in remotefilelog's cache.

    - multiple discovery steps? can be one discovery step that gives you the chunks
Change calculation of file nodeid to not use file content  but instead file content hash?


To be done:
    - augie will implement batch2
    - compression: check why http doesn't support discovery





Session 3B: Hosting & Reviews for Mercurial Users

- How to have hosted Mercurial environment with all the latest features?

    - Multiple options? Only a single option (Bitbucket) does not seem healthy.

    - Problematic: many tools have better/only Git (or Github) integration

    - Available hosting solutions: Bitbucket, Kallithea (self-hosted), Kiln (still exists?)

    - Bitbucket also has a CI solution integrated (pipelines)

    - Difficulty: hosting providers with latest Mercurial features

    - Not easy for Bitbucket: some of these features are not stable

    - It would be very useful to have features like evolve built-in and enabled by default -- otherwise, providers like Atlassian will not build on these features

    - More possible with Kallithea (but self-hosting)

- Evolve: when can it be integrated in core?

    - Discovery is not yet in place

    - Scalability is unclear right now (not enough data)

    - Some issues exist (related to locking)




Session 4: Large Files

* Demo: hg-lfs - a large file concept modeled after git-lfs
* Code: refactor of checkhash method (and _checkhash) in core hg
* Same method could be used for censor, but people want "sensible" message in core to make people aware of censored nodes instead of seeing generic integrity check errors
* hg-lfs uses a local cache for each client that is populated on-demand
* client and server both need to be able to push and pull from the same blob store
* indygreg suggested a method to be able to have server be the only thing capable of writing to blob store, but having client be able to read from blob store advertised by the server
* discussion about adherence to git-lfs protocol (sticking to it will make it easier for, eg,  bitbucket to use it since they just implemented git-lfs support)
* mads suggested making largefiles extension more git-lfs compatible (sean worried about two extensions that try to do the same thng)
* discssuion about streaming: largefiles currently does a lot of work to be able to stream large files directly into the working copy; the current hg-lfs implementation will have full text of files in memory at various times, turning this into streaming is work that still needs to be done
* current largefiles extension is a hack, but most of the bugs have been found by now, so what problem is this trying to solve?
** largefiles changes hashes, so once you opt-in, you can't back out
** this new format doesn't change the hash, so it doesn't lock us in -- it lets us change implementation, back out of bad decisions, etc
* mads: hg-lfs feels too low-level; largefiles feels too high-level; perhaps the vfs level is the right place to implement this (when a large file is found, return a reference to it one vfs.read; when a large file reference is found during a write, replace those contents with the actual file contents)
* GSOC project on largefiles was to implement a redirection feature -- basic functionality is written, supports http protocol, needs more testing and then to be merged




----Saturday----

Lightning talk: version detection
Client reports its version on push, refuses incompatible versions
Suggestions from others: could also be done for other operations

Lightning talk: performance tracking with asv (Philippe Pepiot)
Uses perf extension to generate numbers to feed to asv which shows us pretty web graphs
Atom feed of performance regressions
Eventually plan to include this in Buildbot

Lightning talk: diffs tracking line ranges (Denis Laxalde, Logilab)

Lightning talk: fastannotate and absorb (junw)
Interleaved deltas exist to make annotate faster
Normally for annotate we have to check line-by-line thru history for what revision introduced a line, examing every revision, cannot skip any single revision
Nested deltas would let us see what revision introduced an entire block. since they are nested, we can skip them efficiently
Interleaved deltas add _deletion_ information for entire blocks—but for that we have to have linear history
fastannotate does this, with special handling for merge commits to use normal annotate
Demoed `hg annotate mercurial/commands.py` vs `hg fa mercurial/commands.py` — very fast (applause)
Can also show what revision deleted a line with --deleted
hg absorb: use annotate information to make revising a patch series easier
Make changes in response to review feedback (e.g. fix typos), `hg absorb` lands those changes in the right patch in the series
available at: https://bitbucket.org/facebook/hg-experimental
Marp source: https://bpaste.net/show/f749eb674f09

Lightning talk: narrowhg (durin42)
Narrow clones include only certain directories. Clone and pull both take an --include flag to specify what to pull.
Uses ellipsis nodes from censor infrastructure to rewrite parent revs of commits whose actual parents don't touch included files.
Heavy computation on the server side when widening your local copy.
Keeps full manifests revisions even for ellipsis nodes so that you can still see what files an ellipsis revision changed.
to keep revision number in topo order, inserting a node in the past requires will cause a strip and re-add nodes

Lightning talk: infinitepush (Stanislau Hlebik)
Enables pushing scratch commits to a server; others won't get them unless they explicitly ask for them
Enables things like: backups, reviewing a team's work in progress, interaction w/tools like Phabricator
Doesn't show up in log on server—stored separately from actual repo
Client sends separate bundle2 changegroup, which server routes to external storage
Requires both client-side and server-side extension

Discussion: Why is clone slow? (indygreg)
Depends on the platform.
Mozilla applies a bunch of optimization layers to make it faster incl. streambundles in S3
Condensing number of files in store would help because creating files is a bottleneck (at least on Windows)

Discussion: bitmap index
need the bitmap pattern for various places: hidden, etc.
first step: fully rebuild, next step: do incremental changes


Hacking Session: Making Mercurial compatible to Python 3
Martijn, Greg,  Augie, Yuya, Mateusz and Pulkit hacked and discussed on related issues.
The porting effort has now progressed to actually running hg and fixing issues there, rather than just making importing work.

-----Sunday-----

Mercurial book
- some contents generated from .t unit tests
- building possible through docker now: 'make docker-html'
- repo: https://bitbucket.org/hgbook/hgbook
- website: hgbook.org or https://book.mercurial-scm.org/

State of hg at facebook:
    - 
    - pushvars
    - motivation: merging repositories together (upcoming blog posts about it)
    - dirsync: make changes to multiple directory structures
    - users learned hg and forgot git
    - users like anonymous heads with smartlog

State of hg at Bitbucket:
    - running 3.6
    - topics experiments
    - clone bundles work in progress

PyPy benchmarks:
    - very high variation in bundle performance
    - hg up is not CPU bound
    - hg diff/status is bound to sys time on listing directories, at least on macbook SSD
    - hg serve is 51 req/s on CPython vs 121 req/s on pypy for listing /
    - hg serve with /graph/tip?revcount=5000 payload is 330ms on pypy vs 2300 on cpython
    https://paste.pound-python.org/show/2v6YWfroIUiZ6bH7D4cx/

State of hg at Google:
    - 2-digit number of users
    - currently all users are using local disk clients (not yet VFS)
    - most users coming from Piper workflows, most on Linux (some on Mac)
    - Mac support not there yet, coming soon
    - some IDE integration going on

        - IntelliJ plugin assemes .hg structure - currently works, won't on VFS

        - SourceTree bundles its own hg, ignores Google config

    - Expect patches for configuration management
    - Google is the main user of narrowhg
    - using tree manifests
    - weirdness in the combination of treemanifest, narrowhg and remotefilelog: amend can take over 5 min
    - All clones are "depth 0" (only an ellipsis for head) - upstream narrowhg only supports depth >=1

State of hg at Mozilla:
    - hg is now the VCS of choice for Firefox, other projects still use git
    - about half of employees choose hg (instead of git with plugin to speak hg wire protocol)
    - in the process of transitioning Firefox automation
    - custom replication system built with Kafka - mercurial extension pushes events to Kafka
    - many mq users
    - default vanilla hg config doesn't give a good experience, they have wizard to set better options

        - default config hist rewriting by default will use strip, which doesn't scale well -> using evolve

    - bundles written to a CDN for distribution
    - client-side extension to push for code review, auto-discovers repos

State of hg at Nokia:
    - used in fixed networks division - ~500 developers
    - repository size comparable to Mozilla
    - modified blackbox extension to upload data to servers using UDP

New HG server at Facebook
 - Can't do some things we want to do with scm due to scaling
 - stateless edge servers cache data in useful form
 - blob stores to contain each revision of each useful bit of information (packfile style)
 - mutable state in Zookeeper (heads etc) - conflicts handled in Zookeeper (who wins - pushrebase etc to mitigate failures)
   - Failure mitigation not much changed from today.
 - Fold in all the stuff we're doing as server-side extensions as part of normal operation for the server (multi-gigabyte blobs, frequent WIP pushes from clients etc)
 - Goal is to open source the server, once it's more than just slideware.
 - Implementation currently in rust.
 - Plans to support narrowhg? Not yet, but plan is to support anything useful; primary goal is to support Facebook, but will accept changes to make it more globally useful.
     - Hence Apache Zookeeper. Interfaces to blob store straightforward so that can be replaced trivially (e.g. Amazon S3 instead of FB blob store).
 - Why doesn't hg scale? Number of heads, push contention when large numbers of commits per second. Distribution mechanism needs a full copy of each repo on every server (limits on amount of SSD per server). Also nice to offer non-Mercurial interfaces to SCM (e.g. a service able to edit the commit graph without a working copy).
  - Maybe things like GraphQL queries of the server state?
  - Maybe ability to talk git and hg wireproto from same servers? Migration path from git for people heavily bought in.
    - Killer feature here for Mozilla would be hg monorepo with git clones of subsets that acted like independent repos.
  - Lets us build a nice FUSE client that doesn't have to deal with the bits of wireproto that aren't relevant to an FS
  
Review Process
Mostly about Augie's hgbot
- Explicit syntax for reviewers to change statuses
- "V1", "V2" status tracking already in bot
- Web dashboard

    - View previous series, and comments

    - (optional) comment, send email to list

- hgbot APIs: read-only, write APIs - use emails instead
- opensourced at https://hg.durin42.com
- data range query to support incremental sync from clients like sup/patchwork database
- recommended email / extensions for contributors to have a side-by-side view about patch states: sup (maybe?)

evolve at fb/google
- fb: no trainning evolve experience. rebase --evolve? next --evolve?
  evolve ~= rebase with auto -s and -d
  amend & fix - most of the cases
  "amended as NEWHASH" / "rebased as" in "hg log -G" or smartlog/wip output
     - pyd: useful metadata that may go core, concerned about obstore size
  fbamend - predate evolve, ".preamend" bookmark
  "needs rebase" vs "unstable"
  stack in core without topics concept?
  pdiff, odiff - useful 
  better ui without "evolve" name?
    next --rebase: address the amend in the middle of a stack
    rebase --fixup ~= next --rebase recursively
    unamend
  goal: have the feature without evolve conecepts  
- pyd: --all may disappear
  need explicit --bump flag, default is to only deal with unstable
  --list is to explain what needs to be done, and why
  no exchange: already have ui to prevent user from going to a bad state (less-known concepts)
  rebase needs more obsoleted info, like rebaseskipobsoleted
  "stack" in core with bookmarks support makes sense
  hg up "t5" does the rebase thing
  break down to smaller command/flags which handles a subset of issues evolve can handle makes sense
- google: use "evolve" all the time
  unstable are shown in "hg log -G"
  a little doc about them
  users may want to see the history of amends
- possible actionable items at upstream
  "stack" as an upstream command that deals with bookmarks and topics (also wip/smartlog discuss)
  "tX" as ref names
  default "rebase" without flags behavior ?

- enterprise release support
    announce LTS?
    backport security/serious bug fixes?
    AI(durin42,marmoute,TheMystic,yuya,martinvonz): figure out a policy for longer-supported releases
  pip windows compat? -> setuptools should be used by hg's setup.py
    we should just use setuptools, setuptools no longer is awful
    AI(anyone): send a patch to use setuptools after 4.0 is released
  recommended users to run tests on RCs  
  RCs need to not become the default version on the cheeseshop when uploaded
    AI(anyone): file a bug to figure out how we need to update our tarball naming so we stop confusing the cheeseshop
  Building TortoiseHG on Windows is not robust for Gabor
    AI(Gabor): engage with TortoiseHG devs to figure out why it's not working

Non-recursive globs (Rodrigo, spectral, Durham, :
    Issue is that * is sometimes recursive
    matcher API is a mess
    Should we re-write match.py or just add fileglob?
    Suggestion: add fileglob via a new, cleaner API, then migrate others over time
    Possible FB use case: pick parts of a tree to include and exclude (would add ordering dependency instead of excludes always trumping includes?)
    matcher API should be extensible
    matcher composition: anyof, allof, negate, per-file-type, etc.
    Inconsistencies in pattern behavior between hgignore, --include/--exclude, etc.
    FB: conversion between matchers and watchman expressions
    Proposal: wiki page, first group to have a use case proposes the initial API

Interrupted Clone:
    Most common problem in network interruption
    Partial states with checkpoints that hg pull can recover from (conceptually: sub-transactions)
    Can we do narrow slices (maybe only if we have tree manifest)?
    This has implications of server and client file access patterns -- say, 1000 or 10k revisions at a time
    Differening approach would be recoverable streaming clones - put index upfront and then client can request things it doesn't have  (would need stream clone recovery)
    Today streaming clone API is just "give me a streaming clone" -- if we were to make it recoverable, we would need to expose new APIs
    Mads: if we reused deltas from revlogs during normal clones so streaming clones were not too much more efficient, we could just do normal clones and they could be fast


SHA-256 and tree manifests:
    Tree manifest storage format sub-optimal (too many inodes)
    Performance issues with tree manifests
    Full manifest sorts differently from tree manifests
    Planned support to switch repo from flat manifests to tree manifests
    Talk of hybrid flat and tree where root dirs are flat and N levels deep are tree
    Talk of using tree manifest but building a flat manifest incrementally and hashing to obtain "legacy" hash
    
    SHA-1 is meaningfully weakened
    Google wants more bits in hash
    Ellipsis nodes and impact on hashing/security
    Lots of places "20" is hard coded (bytes in SHA-1)
    Add support for reading old hashes in one release, later release switch to writing new hashes by default in new repos, much later warn about writing old hashes
    SHA-2 is flag day (or at least flag "changeset")
    Is multi-hashing realistic?
    Are node ids correct place to put security? Maybe we should use actual signing cryptography e.g. https://www.mercurial-scm.org/wiki/CommitSigningPlan
    

Incrementally modifying dirstate
Mercurial ends up reading multimegabyte dirstates on many operations. How do we get it under control? Note that modifications are done as "read all, modify in memory, write all"
Mateusz has experimented with replacing dirstate file with sqlite database (see https://bitbucket.org/facebook/hg-experimental/src/54e21a26d78aa6f29c45df99af9904e17fe9512b/sqldirstate/?at=default).
Lots of things generated from dirstate (dirs, fold map etc). Around 4-5 dictionaries created from the dirstate lazily.
How can we avoid reading the whole dirstate on operations? Mateusz persisted all the lazily created dictionaries in sqlite.
sqldirstate is larger than dirstate, but operations like hg status are fast (turns expensive operations into SELECT statements). Slower for large updates (thousands of files changing in the dirstate) due to DB index, plus lots of small INSERT/UPDATE operations.
Transactions require a full copy of dirstate for rollback - sqldirstate converts to a sqlite transaction for faster operations.
Real-world test at Facebook shows that hg update is slower, other operations that use dirstate are faster. Not rolling out at FB until hg update is at least as fast as "traditional" dirstate.
How do we modify dirstate incrementally? Better datastructures welcomed.
Persisting datastructures helped with finding bugs - converting hidden bugs into visible bugs.
Perhaps replace current dirstate with a revlog-style delta increments, but when we flush a block, replace the dirstate file instead of checkpointing to a new block?
sqldirstate has read-time advantages - need an answer to both fast reads and fast writes. Most information in dirstate is ignored during operations - can we speed up by checking to see if manifest changes etc prove that we've not yet changed? However, dirstate is used to avoid checking manifest. Would force us to keep a full manifest around in some form - maybe as a tree to confirm files unchanged since a timestamp?
sqldirstate breaks hooks that access dirstate - the transaction in sqlite is not visible to hooks. Could we make the pending data visible to hooks somehow?
Incremental dirstate would fix today's scaling problem where all operations end up O(num of files in repo), not O(num of files touched in this operation) - e.g. a hg commit ends up checking dirstate => O(dirstate) is speed limiter.
treedirstate? Needs a good sharding algorithm to make it fast, as we don't want one inode per directory, but want directory-based locality. What if we don't store in the filesystem? Some sort of mappable tree, like a database uses internally? roots at end of file of a tree, with occasional garbage collection?
Perhaps simply allowing read uncommitted or sqlite snapshots (experimental) would get the hook issue sorted?
Note that dirstate is almost entirely computed, and the remainder is trivial to replace.
There are foldmaps (how do I get from lowercase form of filename to the "correct" filename - noting that we want to warn people if they have a case conflict on case-insensitive filesystem). There are dirs (what directories are empty?) Computing them from scratch requires walking the whole tree for in-memory dirstate, but can incrementally update mid-operation.
Do we want to integrate mergestate and dirstate? Maybe into one file, if doing sqldirstate - but don't want to be rewriting mergestate all the time just because of an incremental dirstate change (or vice-versa).
We tried just looking at the raw queries used by sqldirstate, and got faster operations that we did inside Mercurial - but code elsewhere in Mercurial "knows" that dirstate is in memory, so we need to cache in memory.
Does Mercurial as a whole benefit from sqldirstate? Need to solve the hooks problem first.
sqldirstate uses transactions to keep consistent view of the dirstate - currently implementation results in dirstate changes advancing mtime (see atomictempfile for a trick to reuse). In database land, though, could just have a version field to detect changes  => I notied that file stat ambiguity of current sqldirstate implementation can be resolved easily by a little patch, I'll post it, soon (foozy)
Note that dirstate is parsed in C.
Does big dirstate issue go away in practice with hg sparse, narrowhg, tree manifest etc?
Are there other on-disk caches like dirstate that (or, indeed should there be more on-disk caches that) would be improved by use of sqlite as the storage format?


hg smartlog/wip/underway
  It should just work out of the box
  Multiple commands for querying interesting changesets
  generic 'hg {show,view,binoculars,whatever}' that would let you inspect tags/bookmarks/whatever"
  overload `hg summary`?
  add `hg summary wdir` (or something) to represent current behavior?
  `hg display` with `hg di` dispatching to `hg diff` seemed to be consensus
  New map-cmdline file to hold templates for different things being showed
  Use formatter
  
zstandard
  JUST DO IT
  Will wait until 4.1 to land experimental support
  server getbundle compression is as important as revlog for mozilla, google (much *more* important than revlog for google, imho ;))
  talk of dictionary magic


CategoryMeetings

4.0sprint (last edited 2016-10-18 13:05:29 by Pierre-YvesDavid)