hg 4.4 sprint notes

Taken from: https://public.etherpad-mozilla.org/p/sprint-hg4.4-NOSPAMREMOVETHATLASTPART (drop the anti spam part)

State of the community:

    Things have not fallen apart without Matt

    Need more reviewers

    Give feedback about phabricator

State of the tooling: 

    experiment with phabricator for a couple of months

    we have a dashboard for stacks

    buildbot now runs Python3

    windows builds are running as well, not green yet

    idea: letting buildbot automatically run tests on patches

State of the code: 


    >100.000 LOC Python

    ~20.000 LOC C

State of Hg at Facebook: 

    Emphasis: Perf

    treemanifest: 2x faster commits, 5x faster show; huge win for large repos

    doubled commit throughput by minimizing critical sections

    restack workflow fully deployed, everyone has obsmarkers on (restack, next --rebase, hide,     unhide, split, fold, uncommit, unamend - commands fully deployed)

    Mercurial for Windows rolled out to some users

    shipped LFS extension, which     does not change the hashes in production 

    already upstreamed extensions     (thanks Pulkit!):

    copytrace, directaccess,      uncommit, unamend, pushvars, commitextras, morestatus

    upcoming: want to upstream     remotenames, pushrebase, infinitepush

    demos for later: "hg     undo", "hg workspaces" 

    big projects in progress:

    Mononoke, talk to Sid,      written in Rust

    Eden, FUSE FS, talk to Adam,      begun to integrate with Mercurial


    reaching the bottom of the      CPU perf pile


    dirstate, lazy changelog   

    in-memory operation

    last-mile perf (1s -> 0s)

    python is a bottleneck

    stateful chg

    integrations with Eden/Mononoke

    ease of use in native code

    need well defined APIs


    a lot of upstreamed stuff is      not extensions, but an opt-in part of the core

State of Hg at Unity: 

    still use hg at unity

    Mads does not work directly     on hg

    main repo, WC = 6gb of large     files (largefiles extension), 1 gb normal files, .hg ~6Gb also

    using generaldelta,     aggressive deltas


    huge manifest, because of      merge workflows

    discovery is very expensive      because of that (>4K named branches)

    push to using Git, a lot of      things are moving into smaller repos

    evolve is tried, but not yet      deployed

State of hg at Mozilla: 

    repo have grown: 400.000 files


    scaling a monorepo

    number of files in a       checkout

    clone size

    making monorepo feel small

    hg configs

    mercurial default configs       is not suitable for our scale

    difficult to tell people to       install different extensions (fsmonitor)

    windows performance

    recent developments

    sparse checkouts in CI for      ~2 months (using core sparse)

    devs don't use sparse      because UX is not perfect

    full clones starting to      takte too long

    tensions between monorepo w/      Hg and microrepos w/ GitHub. Not getting GitHub contributors is a bad      side of a monorepo.

State of hg at Google: 

    2.5% of commits at Google is     done via Mercurial

    centralized configuration

    external extensions:     narrowhg, remotefilelog, full evolve

    internal extensions:     trainingwheels (smoothing piper cmd -> hg cmd), bugreport (copy     problems to someones server), fix (running tools at commits before you run     them), srcfs/citc - virtual FS, codereview extension

    hg xl - similar to smartlog


    linear history

    hg push starts a review,      does not land a commit

    no merges

    not encouraging      bookmarks/named branches/topics

    hg is ran on top of a virtual     FS which knows nothing about it

    automatic update of the     narrowspec

Rust session:

Run some Rust in the default codebase as a week-end objective
Port the EXE wrapper to RUST
Rust has 2 compilers on Windows (we need MSVC)

Problem with Rust and C compiler (Rust is built with modern MSVC, Python27 is built with ancient MSVC 2008, they wouldn't link), EXE wrapper is not a good idea

Ensure that Mercurial can still be packaged on Debian, no need to use specific Rust version

Rust-Cpython uses the CPython API (project is dead)

Bridging-code between cffi and Rust, not cpython would be a good start

    If we use Rust like we used C before

Core of Rust code independent from the bridge

Fuzz all the Rust code

Push a Rust extension? Or translate a C part into Rust and compare the perf ?

Having Rust doesn't remove the pure-Python Code.

CFFI could helps Python 3 migration

The issue with CFFI is to slow down data passing

    Zero-Copy might be possible but not on very old python 2.7 versions

Write our own layer between Python and Rust? (instead of using pyo3 https://github.com/PyO3/pyo3)

Obs-store parser in Rust is 100x times faster than Python C (but 2-3x slower once interacting with rust-cpython)

Rewrite dirstate in Rust?

Buildot builders needs a Rust compiler, a modern Rust compile.

Rust could call C code if it's decoupled from CPython code

Shipping some hooks with the distribution is a good idea!

mpatch.c should be easy to rewrite in Rust, might be a Good Start

How to link libhg with other tools would requires less strong GPL, would the Community agree?

    - Contributer License Agreement for specific directories

(Facebook has its own ConfigParser)? Maybe rewrite the ConfigParser in Rust
Also the CLI option parser might be a good candidate

bdiff.c is also a good candidate

=> Send example hooks

Code Review Session:




    Phabricator enjoyed by people at large companies

    Yuya: emails from Phabricator are hard to read

    Hard to keep track of unread state on Phabricator

    No context in Yadda

    Augie: Phabricator would require too much UX modifications and it wouldn't be Phabricator anymore

    Ryan: email doesn't make it obvious what next actions are; Augie: that's on reviewer/comment

    Augie is ~20% less efficient with Phabricator

    If we had more reviewers, this becomes less of a problem in Augie's opinion.

    Kevin: it would be helpful to have a list of criteria we need in review tool; he has a list and will turn into wiki page

    Augie wants threaded emails and quoted context inline; comment/context goes all the way back to the original line that was commented on

    Patchwork 2.0 has support for custom keywords on reviews!

    Feed Phabricator improvement ideas to Ryan; prioritize them

Some criteria we've discussed before:

    Side-by-side comparison vs. unified diff

    Good-quality, terse e-mails

    Publicly archived

    Read state — catch up to prior state on existing discussion

    Single source of truth — clear status feedback from reviewer

    Syntax highlighting

    Word-enhanced diffs

    Context of full series vs. context of old versions of series

Other nice things:

    Automatic linting

    Automatic test runs/CI feedback

Suggestions for Phabricator upstream:

    Remove the repository name

    Maybe remove to/cc part also from the bottom

Kevin: we should commit to maintain bug tracker more actively
Add donotreap tag on Bugzilla
Daily report generated at https://hgbzstats.octobus.net/, could be configured to send email (weekly?)
gregory/ryan: Proposal to use Phabricator issue tracker
https://phabricator.wikimedia.org/T259 contains script to migrate from Bugzilla to Phabricator

redundant personal and mailing list emails is annoying

=> Activate hgbzstats email weekly

Restack session:

- evolve / restack

    - hg rebase --restack == hg evolve --all, but uses a core command

- hide commits

    - strip by default is bad

    - like prune but with unhide

    - unhide with same hash

- exchange
- uncommit / unamend / split / fold / next --rebase

- understanding obs history

=> shipping obs-markers by defaults
=> ship hide / unhide commands
=> ship obs cycle for unhide

Dirstate session:
Well defined API:

- Ability to experiment
- Scalability / perf

- file status
- working directory parent
- current branch

narrowspec get updated every time a file / directory is touched, which "solves" the perf issue
hg grep to search the whole codebase

durham: tree dirstate data structure, make the dirstate more incremental

------------------ Saturday -----------------

----- Demos ------
Interactive smartlog:
    - Nuclide plugin that lets one drag/drop nodes in the Nuclide IDE's GUI to do rebases
    - shows current status (if watchman is enabled)
    - shows 'status' kind of results, lets you create a commit from them
    - shows you the changes you're going to make, lets you confirm (instead of doing it automatically)
    - Conflict resolution - uses Nuclide's conflict resolution if automerge fails
    - Pull, etc.

    Able to bisect to find regression, has very pretty graphs :) http://perf.octobus.net/

        Slides (in french sorry) we gave about the tool: https://octobus.net/presentations/perf_test.html#/nos-rsultats

    Can we get it into the review tooling?
    Does it make sense to merge with the existing performance/benchmark suite?
    Existing benchmarks are available at https://buildbot.mercurial-scm.org/speed/#/
------ Sessions ------

Discussion session 1:
    Company self-review:
    It would be nice if companies that are doing lots of changes can do a first-pass review (not landing!) of code from coworkers at the same company.
    durin42: "obvious bugfix" - author and pusher can be from the same company.  New features, use judgement
    Reviews are always good, it's mostly about pushing/landing. Anything can be backed out as long as not in a release.
    There have been concerns about big companies "hijacking" the mercurial community
    Steering committee has a limit of number of members from a single company (35% max)
    => Durham: The current status quo is fine.
    What about putting Facebook extensions and Google extensions in core subdirectory.
    Concerns about adding maintenance burden to contributors - changing something in mercurial/foo.py that breaks contrib/some_extension - who is supposed to fix that?
    Backup file madness:
        Better solution for the way that hg "backs up" files during merges
        Currently appends ".orig" to the file, which can conflict with legitimate files in the tree, can overwrite the file that's already there!
        There is an option to move these files to a different directory, can we enable it by default?
        Concerns: makes the .orig files less discoverable.  Maybe print out "backed up to <foo>"
        FB's tweakdefaults extension: switch to updatecheck=noconflict?  (or was this a request to have this in some "upstream" tweakdefaults?)
        - does it make sense to base clobbering .orig on something like this?
        Action item: should move experimental.updatecheck -> commands.updatecheck
        Action item: Change the Mercurial default from 'merge' to 'noconflict' or 'abort' (abort is maximally safe, but might be more annoying; noconflict is closest to git?)
    Skip blame/annotate:
        Experimental flag to "skip" certain revisions, takes a revset.
        Currently in hg today
        At Mozilla, some people really negative on idea of code-reformatting-only commits because it makes blame less useful/accurate

            They think reformating code is hostile to code archeology

        Idea came from Chromium/git hyperblame (https://commondatastorage.googleapis.com/chrome-infra-docs/flat/depot_tools/docs/html/git-hyper-blame.html)
        Perhaps should allow a file that has a list of revsets that describe revisions that should be skipped, instead of requiring it to be specified on the commandline by all users - maybe not automatic, but using some syntax to specify a file (see below)

            Possible security concerns - could put a tag in the description, and hide who actually put a security flaw in there

      Could put an indicator in the blame output that the author/node shown is not the *real* author/node

        hg annotate <path> --skip 'file(.hgskipblame)'
Mononoke presentation:

    infinitepush automatically push any commit


    - Local commit backups

    - API for services, give me revisions between X and Y

    - Support for Eden (Virtual file-system)

    - Geographic redundancy, automatic failover

    Mononoke is the response to these needs

    It is not:

    - A replacement for Mercurial

    - A reimplentation of Mercurial

    - A reimplentation of Mercurial Client

    It is a reimplementation of Mercurial Server


    - Immutable data => Stored in blog store

    - Append-only (metadata) => Mysql

    - Mutable data (bookmark) => Mysql

    Mononoke servers can pull from the central source of truth and cache a lot of things

    Mononoke has both a Mercurial front and an API frontend

    Backend is pluggable

    Mononoke source code is available on github https://github.com/facebookexperimental/mononoke

    Mercurial-bundles/src, mercurial-types, mercurial/src could be imported into core

    Tested with fuzzing (quickcheck)

    Mononoke has code to convert repo to mononoke format (blobimport.rs) which is multithreaded

    Mononoke would not allow serving revlog-based repos

    Only ssh,  Facebook want to provides HTTP interface

    Initial API and provides GraphQL API maybe later

    No server-side extension in Python would works but Mononoke would provide server-side hooks written in Lua

    Pub-Sub extension for Mononoke

    Abstract the server (hg serve) ran in tests so mononoke could be tested against core test suite

In-repo config break-out session
[Greg] Problem space:

    hg out the box not very useful to power users

    Generally per-repo best practices you want people to follow

    Large companies can deploy configs to client machines, but doesn't work in open source. Problem at Mozilla for example.

    Ideally: define config requirements in the repo itself

No programmatic way to upgrade user's .hgrc file
Matt Mackal OK with e.g. .hg/hgrc-auto that was under control of the repo itself
There's a server config extension that does some of this; was demoed at the sprint in Paris 1-2 sprints ago
We think the extension might still be around, but not part of core : https://www.mercurial-scm.org/wiki/ConfigExpressExtension

Key deliverable of this session: way for repo to define desired/required user client config
[Sid] Trust and security issues: current stance is, mercurial itself shouldn't be able to run arbitrary code without user's permission. Maybe a bit too cautious? Hg kind-of has this permission anyway - at least, it has permission to dump files on your disk which you may then execute (e.g. ./configure) without first inspecting the files. You already trust Mercurial.
[Mark] So if pulling over a secure connection, UX could be: "this server has suggested config, would you like to apply it? [y/N]"

Revset aliases: today they can override built-ins (all, parents, etc). Something to be aware of.
    client should reject built-in revsets if provided from server/repo
[Kyle] Do you want configs to be versioned? e.g. do you want to lose aliases every time you update to an old revision? Maybe not. .hg/hgrc not versioned, so if we're talking about something like .hg/hgrc-server it would be weird/unexpected if that was versioned.

[Mark] Were talking last night about having a separate meta-repo. Doesn't have same commit history as the working copy.
[Sid] Maybe a separate branch
[Greg] if crazytown: have it in the repo, have a sparse checkout of it in .hg (?)
[Mark] Have a special file in .hg that is essentially its own filelog, not references in manifest, maintains its own history.
[Greg] Generally lack a mechanism for associating random metadata with a repo
[Mads] Unity does this with a global config store (repo?), can nag if someone's not on a recent-enough version
[Boris] did you looks into what the config express extension does ? it handle multiple of these problems already
[Boris] https://www.mercurial-scm.org/wiki/ConfigExpressExtension
[Boris] Used at nokia for 9 month, Unity testing it.
[Sid] Smallest thing we could do: have a set of safe revset aliases shipped with the repo, composed of safe primitives (where safe is TBD)
[Sid] Shouldn't be sent only at clone time, should be sent also at pull time
Having a meta-repo that stores all this makes sense. Probably don't want it versioned as such (don't want to go back to earlier config on updating to earlier commit) - [Greg] but probably do want minimum version (prompt to update the meta-repo if current version is older than current low water mark)
Inside meta-repo we have a single head (@), gets cloned/pulled automatically whenever you interact with the canonical repo.
What's in the actual repo? i.e. in its working copy?
[Mads] A single hgrc file?
[Greg] Should probably be one config per feature.
[Mads] single file makes it easier to extend / add new features
[Sid] Should we also have a path for users to customise these configs? Should a user be able to tweak these, or should they just override in their own hgrc?
[Kyle] I imagine this as basically being implemented as a %include as the first line of .hg/hgrc, so anything later in the .hg/hgrc would override. [Sid] that's how FB does things today
[Sid] So how do users make changes?
[Greg] It's just a plain old repo, so people can just cd into it and change stuff
[Greg] What if people don't want certain bits of the metarepo config? 
[Greg] should we load up configs in the order global -> metaserver -> user -> repo?
[Mads] Should ensure we figure out the state of prior art first. 
[Sid] vim modelines have had security vulns; need to make sure we don't reproduce this problem
[Mads] Worth pointing out: all the good/most useful use cases we can think of are also the most scary ones.
[Sid] Need a security review / input from people who understand how to find security vulns
[Kyle] Could detect if proposed config varies from existing config, prompt the user to accept
[Sid] People generally bad at understanding security stuff. This doesn't sound like the right defence.
[Rodrigo] Is there any kind of sandboxing that would make sense and still make this useful?
[Sid] Probably this belongs in the build tool. Maybe we can produce a standard tool that people can incorporate into their builds.
[?] How does this make it more secure? Instead of having hg incorporate the config automatically you get the user to run make and *that* updates the config. Not more secure.
[Sid] But running ./configure you fully understand that it could own your system. It makes the security risk more explicit.
[Rodrigo] Simpler suggestion: Having a way for the server to send an output part at the beginning of clone to give the user instructions/suggestions. Maybe from a .hgwelcome file?
[Greg] Will try to hack that together.

Remote name session:
    Augie: Bookmark exchange is awfull, in the whole history, 50 repository have at least one bookmark, in Bitbucket 0.5% have at least one bookmark

        They are usefull locally but painfull when exchanged

Durham: remotenames is now slightly too big, it's not reasonable to get everything into the core.
Proposal for upstreaming:
    - remote bookmarks, but not branches
    - disabling traditional bookmark exchange
    - a bunch of revsets for accessing remotenames
    - remote name hoisting (type master and have it resolved to "default/master")
    - a bunch of 'hg push' improvements [controversial, Augie does not agree with push upstreaming, it has a low priority]
    -- hg push --to: only push to one remote bookmark name
    -- hg push --create, --non-forward-move, --delete - all useful instead of generic --force
    - it's not just UX, has perf wins, because it allows to speed-up heads discovery
  High level plan:
      - clients keep track of remotenames
      - clients allow users to opt-in to disabling normal bookmark syncing
      - servers allow to opt-out of serving local bookmarks
 Augie: we should also at some time think about tracking, but it's probably worth delaying it.
 Kevin: conceptually bookmarks should be: interesting heads from the server and we can hint server that some interesting head should be advanced. Local and remote bookmarks namespaces should be separate, so that if one wants to push a local bookmark foo to a remote bookmark foo, they do something like "hg push -r foo -B foo"
    - it is uncontroversial to upstream the storing of information, expose revsets/log templates
    - breaking the local bookmark syncing: adding a new behavior behind a flag, announce that the default is to be flipped in 1 release. Potentially print warning if bookmarks are exchanged and neither opt-in, nor opt-out config flag is set.

upstream() bookmark is roughly ' or '.join('::%n' % node for node in itertools.chain([allheads(path) for path in paths])

    hg push wip#foo

Durham: argued for deprecating "hg push --bookmark / -B", and using --onto instead of -B.  Kevin/Augie largely okay with --onto; don't really like it but prefer it over --to
  discussion over whether -d/--destination would be better than --onto, and whether we command line flags should always be nouns rather than prepositions
  Upstreaming NarrowHg, Sparse, Lazy changelog session:

Sparse: Durham: Sparse is core, how do we rename debug_sparse.
* Greg: Dirstate needs to be sorted out before. He moved most over, monkeypatched dirstate is outstanding. May also need a new status flag for sparse files (hg purge does not work correctly), would need help from someone who knows this part of the code better
hg purge --all should remove what is not in the sparse
Durham: these files should be deleted during "sparsification"
Kyle: mercurial would be sitting on piper, new state should not list millions of files

Durham: maybe new flag: materialized outside sparse
Ryan: terse may help, 

Kyle: Narrow is implemented in a way: override dirstate.walk - do not even go in there
Adam: had cases when dirs with hundreds of thousands files were still there, but should not have been because of sparse. These were also stale because they were not updated

Augie: Tombstone in dirstate: either unknow or ignored, or do not look

Agreement: we would need to have a new status. This is not part of dirstate, but as an output only

In the short term there are cases where we need to walk outside sparse: e.g. purge
Google restricts what can be walked
Durham: in the Eden world no need for sparse. We would disable things like hg files. IDEs may be an issue, IDEs need too much. Given the IDEs do not support mercurial much, we would need to make changes to them anyway
Kyle: Eclipse, VS Code
Back to sparse, UX: The problems: Greg: Monorepo at Mozilla wants end to end workflow. Clone, indicating working on a subset of the repo, would be nice if Hg web knew what sparse profiles are available, would like to show sparse profiles as if they were there own individual repos. Fine to see the full history.
Augie: parts of narrow should not land yet (ellipsis holepunching, which have issues), Would like to support a sparse checkout of a full clone. Could have more filelogs than working copy
Ryan: discoverability, we have been looking at this. Few thoughts: either a structured field or comment (our sparse profiles support comments), containing a name or description, sparse -i??? to find all the sparse profiles. This is a better experience than learning about the sparse profiles though word of mouth
Durham: hg clone -i -> could discover and show a list to discover profiles. This could swith Greg's UX concern.
Greg: how would you switch between profiles
Augie: wants one way to narrow and widen the files and history
Durham: primitives: repository + working copy, for these include/exclude
What would be a better alternative: adding: adds to both, remove: only from sparse

Durham proposal: Hg repository as a command: with include and exclude commands 

Adam: how would this work with share (Google disabled it), at FB people use it.
At google share causes corruptions (narrow strips)

Kyle: does not like narrow
Andras: what about hg view
We want a new command (bit of bikeshedding) name. 

As a start: we only grow

Ryan: going from 0 rules to 1 rules (or the other way) is an issue: fiels disappear. Proposal: once repo tainted with sparse it stays that way.
Augie: sparse config empty - no files; reset: delets the sparse config

Additional problems with subincludes and hgignore

Shell Prompts Session: 
    FB has scm-prompt.sh in fb-hgext/scripts/ (demoed by Ryan)
    Commonly requested feature: status. Too slow to include though
    Possible solutions: async zsh prompts; watchman-push updated prompt simple dict file in .hg (could also be cronned or only updated when hg commands are run)
    Kevin likes the template language for "hgprompt" (author: sjl on bitbucket). Idea: have extension print out some shell code that could be evaled. Since it's owned by mercurial, it can do things the "best" way given the current mercurial release (eg, in a future with minimal startup time, could be an hg command, but for how could be something more similar to scm-prompt.sh)

Gui for enterprise session:
    Google needs good UI for both Linux and Mac

        Needs to build it


║ Source tree ║ TortoiseHG ║ Nuclide ║ Need                             ║
║             ║ yes        ║ yes     ║ Cross  platform                  ║
║             ║ yes        ║ yes     ║ Graph vizualization              ║
║             ║            ║ yes     ║ Amend / commits                  ║
║             ║            ║ yes     ║ Annotate                         ║
║             ║            ║         ║ obslog                           ║
║             ║            ║ yes     ║ wd state / diffs                 ║
║             ║            ║ ~       ║ interactive commit               ║
║             ║            ║         ║ custom graph revsets             ║
║             ║            ║         ║ templates                        ║
║             ║            ║         ║ custom commands                  ║
║ no          ║ yes        ║ yes     ║ custom extension / configuration ║
║             ║            ║ yes     ║ drag/drop rebase                 ║
║             ║            ║ yes     ║ disabling emails / features      ║

    A proper distribution for Tortoise would helps

    What about curses interfaces?

    https://bitbucket.org/aleufroy/lairucrem is one exemple

    => Try tortoise or write their own web-based interface

Workflow Demos

Suggested Local Branching workflow @ Facebook

Setup: remotenames extension, almost all enabled
Start: hg update master -B NAME (arc feature NAME)
View: hg sl (smartlog) or hg ssl (super smartlog - hits code review server for extra status etc), or Interactive Smartlog GUI
Push: 'hg push --to REMOTEBOOKMARK' (or click a button in the GUI)

Suggested Local Branching workflow @ Google

hg citc newworkspace # create a new workspace which is a virtual filesystem

Update a file get the directory added to the narrow spec
hg uploadchain # upload the current stack and attach code review url to each commit
tags don't get moved when evolving

    uploadchain moves the tags

Want some notes on the obs-markers
Submit is a click on the web-interface

Suggested Local Branching workflow @ Unity

Use named branches
Merge-based workflow
Branches stay opened

Suggested Local Branching workflow @ Mozilla

No standard workflow
Half of developers use Git with git-cinnabar to contribute
hg show work

    Not subcommands yet

    hg swork

hg show stack

    Include the number of commits ahead and rebase command


Bookmarks are used for two things:
    - Identify individual commits
    - Context switching, have a name to come back to a specific context
Worksapce limit the scope of log and advanced log commands (hg sl, hg xl)
Associate metadata to workspaces (like open tabs)
Pending changes could be associated to a workspace
Build artifacts could also be stored in workspaces


Hoisting topic idea comes form remotenames
    * set names automatically when unspecified (something like a UUID) -- some people don't want to think about names, though they like to have things named
    * generate a random name (topic) upon creation of a new draft head

=> Unifying stack definition in core in a extensible API

4.4sprint/Notes (last edited 2018-11-05 22:25:07 by GregorySzorc)