hg 4.6 sprint notes

Taken from: https://public.etherpad-mozilla.org/p/sprint-hg4.6-NOSPAMREMOVETHATLASTPART (drop the anti spam part)
Mercurial 4.6 Sprint

Topics:
    
    * Partial Clones
    * hg fix
    * summer of code
    * infinite push & workspaces
    * hiddenness
    * oxidation
    * better log aliases / templates
    * templating all the things
    * perf tracking
    * stacks
    * funding
    * show in core
    * rebase in core
    * obsmarker exchange
    * kill import transformer
    * project infrastructure
    * py3
    * stock hooks
    * updates from large-scale installs
    * multi threading
    * bundle 2 spec
    * "review graph"
    * remotebookmark push
    * configs in repo/subdirs (hgignore, hgrc, etc)
    
    
Partial clones:
    * narrow/remotefilelog good prior art, needs store refactoring to better handle it
        * revlog (append only) not really great
        * remotefilelog has packs, gc, etc.; it's more robust, but probably needs work
    * one revlog per file doesn't really work on many filesystems
        * need something like pack files
        * new store (storev2)
    * indygreg working on wireproto changes
    * cloning the entire changelog and narrowlog currently OK for Mozilla
        * not going to work for Google though (for the changelogs, at least)
        * Mozilla OK with:
            * Standalone/full changelog
            * Standalone/full manifest
                * Not ideal, but tolerable for most
            * One pack file for all file contents
    * Possible store fixes that aren't partial clone:
        * lack of reader locks
        * transactions not atomic
        * <I missed what this one was, transactions something or other>
        * use of sha1
        * file count overhead
        * different compression
        * indexing things by content hash instead of nodeid
        * time travel mode
            * make everything append only, every transaction records offsets
    * Wireproto changes:
        * make it the same between ssh and http
        * based on frames/streams, allow interleaving
            * spectral: sounds like http2.... ;)
            * http2 vs custom framing mostly a win for flow control/qos/proxies
        * allow for redirecting to a CDN, to offload from the server and to get it closer to users
        * new wireproto that lets it scale to more cores
        * compression in the wireproto
            * ssh wireproto doesn't do compression (relies on ssh -C)
            * uses zlib currently, want to move to something faster
        * interleaving (instead of batch), so requests/responses can be in any order
    * TheMystic: concerns that this is basically just "creating a new version control system" "well, half of one"
    * let's use interfaces for things - where to define the boundary?
        * durham: "filelog" not "store"?
    * separate read/write - helps with caching, etc.
    * store separation:
        * content store
        * history store (p1/p2)
        * linknode/linkrev *separate* from everything else, because right now you need changelog to be written before knowing what the linknode/linkrev is.
            * linknodes are basically just a cache, store them like one
            * allow multiple linknodes
    * wireproto:
        * change changegroup so we can send changelog, manifest, filelogs separately (instead of requiring them to be grouped up)
        * using "parts" instead of "bundle2"
        * grpc/thrift/whatever? (as transport layer, not data format)
            * durin42: no grpc.
                * fundamentally wedded to the C extension
                * grpc requires http2 (TCP/etc)
                * has http2 in the API surface area, so it gets difficult
                * IDL compilation step which we might want to avoid?
        * data format: protobuf, capn proto, thrift, whatever
            * durin42: looking at CBOR - binary json that has some stuff that's nice
        
        
        
Tree manifests:
    * any formal plan for migrating flat -> tree in an existing repo?
        * not really.  servers have a flag day, flat manifests can have only flat parents; trees are "viral"
        * facebook currently uses the 'flat' hash and uses the tree manfiests as a cache, of sorts.
        * having a single hash in the changeset makes this difficult..
        * durham: maybe we store the tree hash in the 'extras' field on push?  It's then validatable...
        
        
11:30 break for a few minutes

`hg fix`:
    * Google has a command, `hg fix`, which runs code formatters (such as clang-format) that runs the formatters on your working directory or the commit/many commits
    * Define a LSP (Language Services Platform(Protocol?)) interface to various formatters
        * Configure formatters in Mercurial, or elsewhere?  how to find the formatters?
        * People want in-repo configuration so that, for example, foo/** is formatted differently from bar/** - how to pass?  Make part of the LSP API definition
        
        
Lunch

1:40: hiddenness:
    * Purpose: clean view, prevent commands like push (diff is totally fine, amend is probably warning, and push is probably prohibit)
    * Implementation wise, repofilters add complexity.
    * UX-wise it's confusing/difficult.
    * Proposal: (not really a concrete proposal, but...): remove repofilter, but add hidden()-revset (or similar) to the places that *want* the filtering - doing it more of a whitelist than a blacklist?
    * Problem: what are we trying to accomplish?
        * what's wrong with repofilters that's easier to reason about it as a set?
        * the repoview stuff is complex: multiple layers of caches, possibly don't have the same repo in one part of the code that we do in the other
            * repo.__dict__ to get to things like the file cache you have to remember to go to the unfiltered view
    * Action Item: capture notes about idea about dealing with hiddenness/visibility when looking at what "heads" will be printed, with something like log.
    
Oxidation:
    - Add the possibility to have Rust extension
    - Not sure how to do the binding yet
    - CPython-rust crate for binding for now (actively supported again)
    - Layer into pure rust component with thin cpython-rust wrapper
    - Greg: Might even be able to autogenerate the bindings
    
Perf Tracking:
    * octo added performance benchmarks
    * how do we handle reports of performance regressions this notices - do we back it out first and retry after discussion, or raise a bug and triage that..?
        * Raise a bug (currently being done)
        * discuss on bugzilla
        * include as much information as possible: information about performance difference, how to reproduce, what the preconditions are for hitting this
            * do we need a reproduction in a separate repo?
    * when adding a new "performance improvement", do we want to have them include a benchmark (in contrib/perf)?  Seems excessive, at least for now.
    => Start a wiki page with triage process, bugzilla template, thresholds and timing for big (define big) regressions


Community Projects and Funding
  - how do we get funding for projects that are broadly useful but not currently funded? How do we help them more broadly know?
    - things like hg-git?  Most other things that are "critical" for the community should be in core.
    - keyring extension for example
  
  Survey for users/companies:
      hg version
          if this version is old, why haven't you upgraded?
      python version
          if this isn't 2.7.9 or later, why are you on an old version?
      OSes in use
      Where do you get Mercurial binary?
      Do you use a GUI (TortoiseHG etc) of Mercurial?
      Is your Mercurial patched locally?
      Do you have custom extensions developed in-house?
      What third party extensions are required for use at your company?
      What third party extensions do you use?
      Is your configuration centrally managed by your company?
      How do you manage feature branches? (named branches, bookmarks, anonymous heads, we don't do feature branches)
      Do you do code review, and if so how?
      How are you training new users?
      What documentation do you use for Mercurial? (hgbook, hg help on CLI, hg help on the web, man pages, other)
      Pain points?
      Nicest feature?
      What kind of feature addition to hg would be a huge improvement for you?
      Do you have any custom Mercurial-related tooling that is open-source that we should know about?
      How do you host your repositories (self-hosted hgweb/ssh, kallithea, rhodecode, bitbucket, other?)
      Do you use editor integration? If so, what editor(s)?
      
      => Should we add a help command for community support (ML/stackoverflow/...)?
      
      
  
Subcommands:
    * just include a space 
    
Review Graphs:paste
    This sounds like mostly a Google specific problem. Teach users that 1 commit does one thing.
    * Tell spectral to stuff it next time he brings it up (I'm spectral and I approve this message. -spectral).
    
View Obsolescence
   `hg obslog` is hard to read, not templatable
   obsolescence isn't shown in `hg log`
   Move prev and next to core?
     but as very basic commands, not as aliases?
   
Named Commits / Killing MQ
   Add `hg commit --name X` to give commits name; store in commit extras
   `hg show X` to display commits with names
   Need cache of names to make names lookups faster
   
Google Summer of Code
   
   We need mentors, Kevin is happy to mentor, Sean and JordiGh has expressed interests
   Look WeShouldUseThat and pick ideas
   hg shellprompt looks a good idea
   Durham: we need --dry flags for every command
   Durham:curses interface to see what's in a revset, tell people about sucking revsets
   Durham: interactive blame is very good idea
   Durham: side by side diffs for external tools
   indygreg: want ^ it after mutating changesets to see what I changed
   
Better log alias and templates

Templating ALL the Things Exit Criteria
Way to discover keywords for a command
Standardized keywords for all commands
Default output expressible as a template?
Talk about template keywords.
https://www.mercurial-scm.org/wiki/GenericTemplatingPlan

Rebase in core
* Enable obs w/o exchange
* Enable rebase in core, keeping existing no-orphan checks
* Add global config knob to disable certain commands
---
Issue: you can still end up with orphans: make commit, exchange it, someone else works on top of it, you amend your local changeset, and then pull from other person
All: Big discussion about how to deal with this
Greg: maybe we can just strip obsmarkers when we pull (or unbundle) a commit that are or would be obsolete. This is a plan to enable rewrites without strip which is a massive perf win for repos like mozilla.
Ryan: I like the sound of this -- are there any big holes?
Another big rabbit hole discussion about marker cycles, visible heads, splitting visibility from obsolescence, etc
Boris: Deleting obsmarkers looses data. Is there any way to avoid that?
Greg: We could add a "undo obsmarker" obsmarker or other data structure that means we don't delete data...
Ryan: local-only file that completely ignores obsmarkers seems like it doesn't delete info and might still solve problem?
Martin: Why not just reuse Facebook/Jun's obsmarker cycles?
Kevin & others: That has a lot more implications that I'm not comfortable with
Greg: No matter what we do if we stop doing strip on rewrites, it makes hg way faster
Ryan: Concrete proposal
Additional discussion over lunch: just deleting seems to be easier and backwards-compatible to old versions, we will probably implement that.

show in core
  Yuya has some concerns about templating which needs to be fixed
  having sub-commands will be nice
  Nobody is against moving it in core
  Greg says we can iterate it on in core
  Someone should step up and do that
  
Project Infrastructure
  Matt always said we want wiki to look like rest of the design and have as homepage
  We should move wiki's somewhere else, someone should step up do that
  Google is donating to speed up buildbots
  Community will like to test extensions on buildbot, so more funding/ donation is welcome
  
Killing the source transformer
  Augie is planning to unwind things step by step
  We have skip-blame and we add b'' prefixes at places
  the code is in mercurial/__init__.py
  
Update from large-scale Google

    More users than before

    Growth on support < growth of users

    Same level of use as Git

    People happy with 1 commit == 1 code review

    Stack median size is 2, average 1.75

    Good feedback on managing stacks

    Top feature requests:

      - Patch, pulling patches pushed for review

      - Support for a linter

      - Evolve moves the working directory and should not

      - Move a chunk from a changeset to another

      - Proper GUI


Update from large-scale Facebook
  - Ship Rust dirstate, very good performance
    - Benchmark on small repositories?
    - Q2 we will see patches for Rust Rewriting
      - New store to replace them all in Rust
    - Mononoke still in dev
    - Add progress bar to more commands to find what it takes time
    - In-Memory merge rollout
    
Update from large-scale Mozilla
  - Firefox repo still growing
  - Move the Mozilla to AWS to the same EC2 to use partial-clones
  - 
  
=> For bug-reporting, add a bug_report extension that includes old Mercurial version, repository characteristics, debuginstall, debugextension, verify call, config, remote exchange characteristics, journal, blackbox (maybe customizable by flag or config option)

    => Add a bug report link at the end (maybe open the browser automatically)? that is customizable for businesses.


Stacks
   histedit and `show stack` have stack concepts
   rebase, email, phabsend could be stack aware
   next and prev might use stack definition
   topics extension for showing linear stack
   Want clean "stack" definition in core
   getstack(rev) -> list of revs
   
Partial clone profiles
   We want named profiles for partial clones
   Let's rename global --profile and/or not recognize it after command name...
   So we can use `hg clone --profile X` to clone a named partial clone "profile"
   `hg clone --list-profiles` will show available profiles on the remote
   Expanding set of includes and excludes on each revision being transferred is expensive
   If narrow spec shrinks and spec is evaluated on the server, client may not have all data
   Limit matcher types to rootfilesin:, path:   - don't allow glob:, re:, include:, subinclude:, etc
      Require prefix (foo:) in files to avoid ambiguity
   If you do a local commit that widens the profile, what happens?
      Should client go to server automatically? Wait until next pull?
  What is remotefilelog:
      Download file data (wire protocol commands)
      Storage of file data
      Local commits go to local storage that isn't revlogs
  Shipping narrow in core without profile support
     Takes a list of includes/excludes and sends filelog data for those
     Add profiles later as something that always expands
     Precomputation/caching/offloading for sets of includes/excludes
     


Update from large-scale Atlassian
  - Diff-annotated based code review

# Pushing to remote bookmarks
1. We need a way to push to a remote bookmark without having a local bookmark
- This is the last piece of ux needed to allow a no-local-bookmark workflow involving remote bookmarks (facebook uses 'hg push --to' to accomplish this)
- Conclusion:  Have a config option that makes 'hg push -B book_name' push to the given bookmark name (versus pushing the given local bookmark name). We also need to advertise the config option in the failure text if a user runs `hg push -B foo` when foo doesn't exist.

2. `hg push --force` is bad and should be more granular
- Conclusion:  allow `hg push --allow ENUM` where you can pass allow-anon, create, non-forward-move, etc as overrides to bypass safety checks. We need to make sure the available enum values are visible in the help text.

Linting

https://firefox-source-docs.mozilla.org/tools/lint/index.html
https://hg.mozilla.org/mozilla-central/file/tip/python/mozlint
https://hg.mozilla.org/mozilla-central/file/tip/tools/lint

Templating
   Maybe we need to register template keywords?
   Maybe we should standardize keywords outside of `hg log`?
   
Things to unexperimnetal from mercurial/configitems.py:
    1) bundle-phases -> do it
    2) bundle2.advertise -> delete this
    3) bundle2-output-capture -> delete this
    4) bundle2.stream -> soon
    5) copytrace.* -> document them and move out of experimental
    6) crecordtest -> move to devel option
    7) editortmphg -> make it non experimental, should ponder about turning on default
    8) worddiff -> move it to diffopts
    9) maxdeltachainspan -> move it to format (don't move it to format, the option is a hack that should die)
    10) nonnormalparanoidcheck -> delete it and contrib/dirstatenormalcheck thing
    11) extendheader* -> move it to diffopts
    12) graphshorten -> move it to ui
    13) graphstyle* -> move it ui
    14) hook-track-tags -> follow up with octobus (performance impact is still a bit too high to enable by default. need smarter implementation (easier now))
    15) mergedriver -> it is complicated, we need to document that before doing that anything
    16) obsmarker-exchange-debug -> move to devel
    17) single-head-per-branch: figure out to interact with incore hooks and move it there
    18) spacemovesdown -> should be ui.curses.spacemovedown
4.6sprint/Notes

hg 4.6 sprint notes