Storage format for remotenames.

Fri Nov 10 17:24:59 EST 2017

On Mon, Nov 06, 2017 at 09:46:18AM -0800, Gregory Szorc wrote:
> On Mon, Nov 6, 2017 at 4:34 AM, Pulkit Goyal <7895pulkit at gmail.com> wrote:
>
> > Hey,
> >
> > I am working on porting functionalities from hgremotenames extension
> > to core. The hgremotenames extensions pull the information about
> > remote branches and remote bookmarks and store them to provide a
> > better workflow.
> >
> > The current storage format which hgremotenames has is having a file
> > `.hg/remotenames` in which each line is of the format `node nametype
> > name`, where
> >   - `node` refers to node where the remotename was last seen
> >   -  `nametype` refers whether it's a bookmark or branch
> >   - `name` consists of name of the remote and name of the remote
> > bookmark/branch
> >
> > At sprint, Ryan suggested to split the file according to bookmark and
> > branches so that we can read and write more easily which makes sense.
> >
> > While working on the patches, I found out that if the name of the
> > remote contains a '/', then the current storage format is not good and
> > we can fail to parse things correctly.
> >
> > Do you guys have any better ideas on how we can store remotenames?
> >
>
> I have somewhat strong feels that we should strive to use append-only file
> formats as much as possible. This will enable us to more easily implement a
> "time machine" feature that allows the full state of the repository at a
> previous point in time to be seen. It also makes transactions lighter, as
> we don't need to perform a full file backup: you merely record the offset
> of files being appended to.
>
> A problem with naive append-only files is you need to scan the entire file
> to obtain the most recent state. But this can be rectified with either
> periodic "snapshots" in the data stream (like how revlogs periodically
> store a fulltext) or via separate cache files holding snapshot(s). The tags
> and branches caches kinda work this way: they effectively prevent full
> scans or expensive reads of changelog and/or manifest-based data.
>
> A revlog may actually not be a bad solution to this problem space. A bit
> heavyweight. But the solution exists and should be relatively easy to
> integrate.

I think for now we should not let the perfect (generalized undo) be
the enemy of the good (keeping track of where labels were on
remotes). For now, I'd be fine with using some sort of null-separated
file format. It's gross, but it should let Pulkit land the feature,
and since it's largely advisory data it's not a crisis if we improve
it later.

> _______________________________________________
> Mercurial-devel mailing list
> Mercurial-devel at mercurial-scm.org
> https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel