Storage format for remotenames.

Fri Nov 10 17:46:58 EST 2017

On Fri, Nov 10, 2017 at 2:24 PM, Augie Fackler <raf at durin42.com> wrote:

> On Mon, Nov 06, 2017 at 09:46:18AM -0800, Gregory Szorc wrote:
> > On Mon, Nov 6, 2017 at 4:34 AM, Pulkit Goyal <7895pulkit at gmail.com>
> wrote:
> >
> > > Hey,
> > >
> > > I am working on porting functionalities from hgremotenames extension
> > > to core. The hgremotenames extensions pull the information about
> > > remote branches and remote bookmarks and store them to provide a
> > > better workflow.
> > >
> > > The current storage format which hgremotenames has is having a file
> > > `.hg/remotenames` in which each line is of the format `node nametype
> > > name`, where
> > >   - `node` refers to node where the remotename was last seen
> > >   -  `nametype` refers whether it's a bookmark or branch
> > >   - `name` consists of name of the remote and name of the remote
> > > bookmark/branch
> > >
> > > At sprint, Ryan suggested to split the file according to bookmark and
> > > branches so that we can read and write more easily which makes sense.
> > >
> > > While working on the patches, I found out that if the name of the
> > > remote contains a '/', then the current storage format is not good and
> > > we can fail to parse things correctly.
> > >
> > > Do you guys have any better ideas on how we can store remotenames?
> > >
> >
> > I have somewhat strong feels that we should strive to use append-only
> file
> > formats as much as possible. This will enable us to more easily
> implement a
> > "time machine" feature that allows the full state of the repository at a
> > previous point in time to be seen. It also makes transactions lighter, as
> > we don't need to perform a full file backup: you merely record the offset
> > of files being appended to.
> >
> > A problem with naive append-only files is you need to scan the entire
> file
> > to obtain the most recent state. But this can be rectified with either
> > periodic "snapshots" in the data stream (like how revlogs periodically
> > store a fulltext) or via separate cache files holding snapshot(s). The
> tags
> > and branches caches kinda work this way: they effectively prevent full
> > scans or expensive reads of changelog and/or manifest-based data.
> >
> > A revlog may actually not be a bad solution to this problem space. A bit
> > heavyweight. But the solution exists and should be relatively easy to
> > integrate.
>
> I think for now we should not let the perfect (generalized undo) be
> the enemy of the good (keeping track of where labels were on
> remotes). For now, I'd be fine with using some sort of null-separated
> file format. It's gross, but it should let Pulkit land the feature,
> and since it's largely advisory data it's not a crisis if we improve
> it later.
>

I agree. A generalized undo format is a lot of work and is definitely scope
bloat for remote names.

But my point about append-only data structures still stands. You basically
get the Git equivalent of the "reflog" for free if you e.g. store this data
in a revlog.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.mercurial-scm.org/pipermail/mercurial-devel/attachments/20171110/2ae94822/attachment.html>