Storage format for remotenames.

Gregory Szorc gregory.szorc at gmail.com
Fri Nov 10 18:03:30 EST 2017


On Fri, Nov 10, 2017 at 2:48 PM, Augie Fackler <raf at durin42.com> wrote:

>
> On Nov 10, 2017, at 17:46, Gregory Szorc <gregory.szorc at gmail.com> wrote:
>
> On Fri, Nov 10, 2017 at 2:24 PM, Augie Fackler <raf at durin42.com> wrote:
>
>> On Mon, Nov 06, 2017 at 09:46:18AM -0800, Gregory Szorc wrote:
>> > On Mon, Nov 6, 2017 at 4:34 AM, Pulkit Goyal <7895pulkit at gmail.com>
>> wrote:
>> >
>> > > Hey,
>> > >
>> > > I am working on porting functionalities from hgremotenames extension
>> > > to core. The hgremotenames extensions pull the information about
>> > > remote branches and remote bookmarks and store them to provide a
>> > > better workflow.
>> > >
>> > > The current storage format which hgremotenames has is having a file
>> > > `.hg/remotenames` in which each line is of the format `node nametype
>> > > name`, where
>> > >   - `node` refers to node where the remotename was last seen
>> > >   -  `nametype` refers whether it's a bookmark or branch
>> > >   - `name` consists of name of the remote and name of the remote
>> > > bookmark/branch
>> > >
>> > > At sprint, Ryan suggested to split the file according to bookmark and
>> > > branches so that we can read and write more easily which makes sense.
>> > >
>> > > While working on the patches, I found out that if the name of the
>> > > remote contains a '/', then the current storage format is not good and
>> > > we can fail to parse things correctly.
>> > >
>> > > Do you guys have any better ideas on how we can store remotenames?
>> > >
>> >
>> > I have somewhat strong feels that we should strive to use append-only
>> file
>> > formats as much as possible. This will enable us to more easily
>> implement a
>> > "time machine" feature that allows the full state of the repository at a
>> > previous point in time to be seen. It also makes transactions lighter,
>> as
>> > we don't need to perform a full file backup: you merely record the
>> offset
>> > of files being appended to.
>> >
>> > A problem with naive append-only files is you need to scan the entire
>> file
>> > to obtain the most recent state. But this can be rectified with either
>> > periodic "snapshots" in the data stream (like how revlogs periodically
>> > store a fulltext) or via separate cache files holding snapshot(s). The
>> tags
>> > and branches caches kinda work this way: they effectively prevent full
>> > scans or expensive reads of changelog and/or manifest-based data.
>> >
>> > A revlog may actually not be a bad solution to this problem space. A bit
>> > heavyweight. But the solution exists and should be relatively easy to
>> > integrate.
>>
>> I think for now we should not let the perfect (generalized undo) be
>> the enemy of the good (keeping track of where labels were on
>> remotes). For now, I'd be fine with using some sort of null-separated
>> file format. It's gross, but it should let Pulkit land the feature,
>> and since it's largely advisory data it's not a crisis if we improve
>> it later.
>>
>
> I agree. A generalized undo format is a lot of work and is definitely
> scope bloat for remote names.
>
> But my point about append-only data structures still stands. You basically
> get the Git equivalent of the "reflog" for free if you e.g. store this data
> in a revlog.
>
>
> That sounds relevant for journal, less so for remotenames.
>

Fair enough.


>
> Would it make sense to try and come to an agreement on a single format we
> can use for node->label storage? It's come up in the bookmarks binary part
> patches as well...
>

Perhaps. But that could quickly devolve into such topics as:

* Unified vs per-domain storage
* Label specific versus generic node metadata storage
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.mercurial-scm.org/pipermail/mercurial-devel/attachments/20171110/e2acf44d/attachment.html>


More information about the Mercurial-devel mailing list