rust hg status
vgatien-baron at janestreet.com
Tue Feb 19 14:43:10 EST 2019
On Tue, Feb 19, 2019 at 10:46 AM Augie Fackler <raf at durin42.com> wrote:
> On Fri, Feb 15, 2019 at 02:39:44PM -0500, Valentin Gatien-Baron wrote:
> > Hello,
> > I wrote a fraction of hg status in rust, just the minimum needed to
> > compare current revision and working copy with few of the flags and
> > config settings supported. As you can imagine, the goal was better
> > performance. Before trying to upstream bits of this, I figured I'd
> > check there's interest for this change in particular, or this kind
> > of changes in general (I suspect rust would bring significant
> > improvements to hg cat or hg files). The rest of this mail is more
> > details.
> This sounds _very_ promising and I'd love to see what you've got!
Seeing my mail again, it's perhaps not clearly said that what I have and
what I timed below is a fully rust exe that implements a fraction of hg
status, not a change to python hg that uses big chunks of rust some
fraction of the time. Though it seems that upstreaming would take the
latter approach, at least to start with.
> > While the implementation doesn't handle every uncommon situation right
> > and could use some serious cleanup, it's an interesting performance
> > improvement. In a repository with 100k tracked files and 500k ignored
> > files, in the best case and measuring on a good machine:
> > - hg-rs st takes ~50ms
> > - hg-rs st -mard takes ~14ms
> > - hg-rs st -u takes ~39ms
> > By contrast, hg+chg+fsmonitor's best case is 110ms regardless of
> > flags. Without fsmonitor, we're talking about 2.4s for hg st or hg st
> > -u, and 400ms for hg st -mard. As a baseline, hg st --syntax-error
> > takes 12ms.
> Fascinating! Are you using re2 or Python's built-in re?
Definitely using re2. If I disable re2, the full status goes from 2.4s to
5.7s. I didn't say how the rust implementation differs from the python
version, but using rust+re2 is not enough to get to 40ms for finding
unknown files. In addition to optimizations to the hgignore handling
(mostly special treatment of globs that can match exactly one file), and
parallelism, and not pointlessly lstat'ing untracked files in filesystems
that provide the filetype in readdir, there's a cache that holds a list of
"this directory is known to have no untracked files assuming it has this
timestamp, and the hgignore is bla and the dirstate is bla", which usually
shortcuts the listing of untracked files in most directories, and thus
shortcuts applying the hgignore on such files.
Though even when the cache fails to help, like when the hgignore changes,
rust status takes 300ms (and it's quite plausible there's room for
improvement here, I stopped optimizing when it felt like a good enough
> > A ratio of x2 compared with fsmonitor+chg is nice, but while neither
> > best case is what you get all the time, fsmonitor degrades pretty
> > badly, oftentimes in hard to understand ways, making for an
> > unpredictable experience that is frequently bad.
> > Say you change the hgignore, the rust version will take 300ms, the
> > fsmonitor version will take 4.4s (I think 2s timeout + 2.4s regular
> > status).
> > Say you remove a directory at the root of the repository, 50ms rust
> > vs 4.4s fsmonitor.
> > Say you haven't used a particular share in some time, you may well see
> > 1s rust vs 4.4s fsmonitor.
> > So I think there's a lot of value in having status without fsmonitor
> > going much faster:
> > - increase significantly the scale at which fsmonitor is needed
> > - improve the bad cases of fsmonitor (or even the fast path depending
> > on how things are made to work together)
> > Regards,
> > Valentin Gatien-Baron
> > _______________________________________________
> > Mercurial-devel mailing list
> > Mercurial-devel at mercurial-scm.org
> > https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Mercurial-devel