Microsoft on scaling Git

Gregory Szorc gregory.szorc at gmail.com
Sat Feb 4 05:55:58 UTC 2017


https://blogs.msdn.microsoft.com/bharry/2017/02/03/scaling-git-and-some-back-story/
is a good read about Microsoft's history with version control.

tl;dr Microsoft has created Git Virtual File System (GVFS) to allow Git to
scale to millions of files and tens of gigabytes in size using the Windows
equivalent of a FUSE filesystem.

The underpinnings of GVFS appear to be a File System Filter Driver [1], an
HTTP server exposing Git data, and a userland daemon for communicating with
the file system filter and the HTTP server.  Basically, the .git directory
and working directory are "virtualized" by the file system filter driver
(think FUSE file system). When a Git client interacts with a "virtualized"
directory, the I/O request is initially handled by the file system filter
driver, which appears to pass on the request to the userland daemon.
Instead of a fully distributed clone, a client lazily downloads data from a
server upon initial access. Subsequent accesses are serviced locally.

Microsoft open sourced the "middleware" userland daemon, GVFS [2], which is
written mostly in C#. The low-level file system filter driver is still
closed source and has a restrictive EULA that says "solely for use with
Microsoft Git Virtual File System (GVFS) and otherwise for your internal
business purposes. You may not use the software in a live operating
environment unless Microsoft permits you to do so under another agreement."
[3] However, someone at Microsoft has indicated the file system filter
driver may be open sourced [4]. The server-side components are part of TFS
Git Server, which is closed source. However, the protocol is documented [5]
and building your own server should be possible.

The custom file system filter driver (currently) requires changing
low-level OS configs to enable unsigned drivers. That's a bit scary. But
you are inserting a driver into the kernel. Presumably Microsoft will
distribute a signed version of the driver eventually.

I'm still trying to wrap my head around what Git client modifications are
necessary. Microsoft has a GVFS-compatible fork of Git at [6]. In theory,
you shouldn't need too many modifications of the Git client because .git is
"virtualized." But I'm sure there are certain operations that needed
tweaking to better handle the drastically different behavior profile of
GVFS.

So, basically Microsoft extended the concepts of Git LFS (remote storage)
to normal repository storage. Put in Mercurial terms, it is like
remotefilelog (or a fully-implemented narrow+shallow clone) plus a virtual
file system (like Google's Piper/CitC). The novel work here is Windows
support (AFAIK nobody has really done a virtual file system for version
control on Windows - only on Linux and Linux-like platforms).

GVFS is an impressive piece of work. While it only supports Windows with a
TFS server currently, my guess is someone will eventually hack up a FUSE
filesystem for use with the GVFS server protocol. And, a non-TFS server
implementation seems achievable.

I'll be much more excited about this if/when Microsoft open sources the
file system filter driver. I've toyed around with file system filter
drivers in the past and I quickly got overwhelmed because of the complexity
and lack of a good reference implementation. If Microsoft's "gvflt" is open
sourced, one could imagine using it as a base for writing a file system
filter driver for Mercurial using a similar architecture as GVFS.

[1]
https://msdn.microsoft.com/en-us/windows/hardware/drivers/ifs/introduction-to-file-system-filter-drivers
[2] https://github.com/Microsoft/GVFS
[3] https://www.nuget.org/packages/Microsoft.GVFS.GvFlt/0.17131.2-preview
[4] https://github.com/Microsoft/GVFS/issues/5
[5] https://github.com/Microsoft/GVFS/blob/master/Protocol.md
[6] https://github.com/Microsoft/git
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.mercurial-scm.org/pipermail/mercurial-devel/attachments/20170203/64b3a4ef/attachment.html>


More information about the Mercurial-devel mailing list