Storage format for remotenames.

Fri Nov 10 17:48:23 EST 2017

> On Nov 10, 2017, at 17:46, Gregory Szorc <gregory.szorc at gmail.com> wrote:
> 
> On Fri, Nov 10, 2017 at 2:24 PM, Augie Fackler <raf at durin42.com <mailto:raf at durin42.com>> wrote:
> On Mon, Nov 06, 2017 at 09:46:18AM -0800, Gregory Szorc wrote:
> > On Mon, Nov 6, 2017 at 4:34 AM, Pulkit Goyal <7895pulkit at gmail.com <mailto:7895pulkit at gmail.com>> wrote:
> >
> > > Hey,
> > >
> > > I am working on porting functionalities from hgremotenames extension
> > > to core. The hgremotenames extensions pull the information about
> > > remote branches and remote bookmarks and store them to provide a
> > > better workflow.
> > >
> > > The current storage format which hgremotenames has is having a file
> > > `.hg/remotenames` in which each line is of the format `node nametype
> > > name`, where
> > >   - `node` refers to node where the remotename was last seen
> > >   -  `nametype` refers whether it's a bookmark or branch
> > >   - `name` consists of name of the remote and name of the remote
> > > bookmark/branch
> > >
> > > At sprint, Ryan suggested to split the file according to bookmark and
> > > branches so that we can read and write more easily which makes sense.
> > >
> > > While working on the patches, I found out that if the name of the
> > > remote contains a '/', then the current storage format is not good and
> > > we can fail to parse things correctly.
> > >
> > > Do you guys have any better ideas on how we can store remotenames?
> > >
> >
> > I have somewhat strong feels that we should strive to use append-only file
> > formats as much as possible. This will enable us to more easily implement a
> > "time machine" feature that allows the full state of the repository at a
> > previous point in time to be seen. It also makes transactions lighter, as
> > we don't need to perform a full file backup: you merely record the offset
> > of files being appended to.
> >
> > A problem with naive append-only files is you need to scan the entire file
> > to obtain the most recent state. But this can be rectified with either
> > periodic "snapshots" in the data stream (like how revlogs periodically
> > store a fulltext) or via separate cache files holding snapshot(s). The tags
> > and branches caches kinda work this way: they effectively prevent full
> > scans or expensive reads of changelog and/or manifest-based data.
> >
> > A revlog may actually not be a bad solution to this problem space. A bit
> > heavyweight. But the solution exists and should be relatively easy to
> > integrate.
> 
> I think for now we should not let the perfect (generalized undo) be
> the enemy of the good (keeping track of where labels were on
> remotes). For now, I'd be fine with using some sort of null-separated
> file format. It's gross, but it should let Pulkit land the feature,
> and since it's largely advisory data it's not a crisis if we improve
> it later.
> 
> I agree. A generalized undo format is a lot of work and is definitely scope bloat for remote names.
> 
> But my point about append-only data structures still stands. You basically get the Git equivalent of the "reflog" for free if you e.g. store this data in a revlog.

That sounds relevant for journal, less so for remotenames.

Would it make sense to try and come to an agreement on a single format we can use for node->label storage? It's come up in the bookmarks binary part patches as well...

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.mercurial-scm.org/pipermail/mercurial-devel/attachments/20171110/dc273621/attachment.html>