[PATCH 12 of 14] vfs: add the possibility to have a "ward" to check vfs usage

Fri Jul 7 12:48:44 EDT 2017

On Fri, 2017-07-07 at 21:48 +0900, Yuya Nishihara wrote:
> On Thu, 06 Jul 2017 20:53:50 +0200, Boris Feld wrote:
> > On Wed, 2017-07-05 at 23:55 +0900, Yuya Nishihara wrote:
> > > On Sun, 02 Jul 2017 04:56:37 +0200, Pierre-Yves David wrote:
> > > > # HG changeset patch
> > > > # User Pierre-Yves David <pierre-yves.david at ens-lyon.org>
> > > > # Date 1470323266 -7200
> > > > #      Thu Aug 04 17:07:46 2016 +0200
> > > > # Node ID ebf81efdd6d7ff15c64683933635616c00475688
> > > > # Parent  34b8be7f0420b07d0f7b71379c6055e5b26223d5
> > > > # EXP-Topic vfs.ward
> > > > # Available At https://www.mercurial-scm.org/repo/users/marmout
> > > > e/me
> > > > rcurial/
> > > > #              hg pull https://www.mercurial-scm.org/repo/users
> > > > /mar
> > > > moute/mercurial/ -r ebf81efdd6d7
> > > > vfs: add the possibility to have a "ward" to check vfs usage
> > > > The feature has some similarity with the 'pathauditor', but
> > > > further
> > > > digging
> > > > show it make more sense to keep them separated. The path
> > > > auditor is
> > > > fairly
> > > > autonomous (create from vfs.__init__ without any extra
> > > > information), and check
> > > > very generic details. On the other hand, the wards will have
> > > > deeper
> > > > knownledge
> > > > of the Mercurial logic and is created at the local repository
> > > > level. Mixing the
> > > > two would add much complexity for no real gains.
> > > 
> > > My gut feeling is that vfs isn't the right place if they require
> > > deeper
> > > knowledge of repository/dirstate logic. And the whole weakref
> > > business
> > > floating around wards, hooks and transaction seems like just
> > > getting
> > > around
> > > layering violation.
> > 
> > We pondered on this choice a long time and couldn't find any other
> > layer that is used by both Mercurial core and the extensions to
> > access
> > the file system. It seems to be the highest abstraction layer we
> > could
> > hook into without introducing a new one and updating all callers.
> > 
> > Our reflexion is, as the vfs classes comes from the scmutil file,
> > it
> > seems okay to have scm related logic in that layer.
> 
> IIRC, one possible alternative was to move lock itself to a vfs as
> the
> whole vfs is theoretically covered by the lock.

The practice divert a bit from the theory. For example, the
".hg/bookmark" file is in theory covered by the 'wlock'. In practice it
is now covered by the transaction and therefor the 'lock'. On the other
hand there are multiple files handled by the vfs ('hgrc', 'dirstate',
'store/', etc) that are not covered by the wlock.

Right now, all these exceptions are stored on the repository, and
extension can update them. To us, the repository layers seems the right
layer for logic regarding individual file semantic. We also want to add
some logic regarding open transaction to the ward, something unrelated
to the vfs.

The initial motivation for this series (beside catching bug) is to make
it safer to introduce new vfs. For example, the share extensions
suffers from lack of cache sharing, each share has its own '.hg/cache/'
directory. Introducing a 'repo.cvfs' dedicated to cache could fix that,
but we wants to warn people still accessing 'cache/' through
'repo.vfs'.

To push that logic further, "share" actually needs a better distinction
between the working copy specific piece and the more generic pieces. An
extra split of "repo.vfs" could take care of that. However, some of the
file, like bookmark, can be shared or not depending on some config and
share option. So the ward needs access to that logic (already living on
the repository) too.

We need to eventually rework the repository format to a more race free
data structure. This will likely involve more vfs to access data
dispatched in various directory (eg: append-only vs full update). The
ward should help a lot here but it need to be aware of many high level
logic regarding file semantic.

We plan to add more vfs-s and the current locking design seems better
than having one lock per vfs, as multiple vfs will be covered by the
same lock. If we move the locks on the vfs layer, we either would need
to update it again later or introduce much more complex code to manage
the interaction between vfs-s locks.

> 
> BTW, the ward itself doesn't require deeper knowledge of repo object
> right
> now. Neither does the auditor. How will the ward API be evolved to
> depend
> on repository-level data?

In this series, the vfs ward use data from "repo._wlockfreeprefix" to
read its exception. Given the type of data and the need for extension
to wrap it. The repo seems a good place to hold this data.

> 
> > > Do you have any idea how to expand this ward() callback to the
> > > other
> > > vfs
> > > operations, which have different set of parameters?
> > 
> > All other operations (move, delete, copy, etc) could be
> > "translated" to
> > 'read' or 'write' mode. An alternative would be to abstract the
> > mode
> > with a couple of symbolic constant "create, update, append, delete,
> > …".
> > What do you think?
> 
> So it will be quite similar to audit(path) if it had a mode flag?

In the end likely, but updating vfs code to call audit for each
operation and the audit functions source code right now would requires
too much works for this series.

We plan to send a new series after this one that will refactor this
part and it's likely that pathauditor will call the ward with the same
flag as we converge toward something usable everywhere.