<div dir="ltr"><div class="gmail_extra"><div class="gmail_quote">On Mon, Nov 6, 2017 at 4:34 AM, Pulkit Goyal <span dir="ltr"><<a href="mailto:7895pulkit@gmail.com" target="_blank">7895pulkit@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Hey,<br>

<br>

I am working on porting functionalities from hgremotenames extension<br>

to core. The hgremotenames extensions pull the information about<br>

remote branches and remote bookmarks and store them to provide a<br>

better workflow.<br>

<br>

The current storage format which hgremotenames has is having a file<br>

`.hg/remotenames` in which each line is of the format `node nametype<br>

name`, where<br>

  - `node` refers to node where the remotename was last seen<br>

  -  `nametype` refers whether it's a bookmark or branch<br>

  - `name` consists of name of the remote and name of the remote bookmark/branch<br>

<br>

At sprint, Ryan suggested to split the file according to bookmark and<br>

branches so that we can read and write more easily which makes sense.<br>

<br>

While working on the patches, I found out that if the name of the<br>

remote contains a '/', then the current storage format is not good and<br>

we can fail to parse things correctly.<br>

<br>

Do you guys have any better ideas on how we can store remotenames?<br></blockquote><div><br></div><div>I have somewhat strong feels that we should strive to use append-only file formats as much as possible. This will enable us to more easily implement a "time machine" feature that allows the full state of the repository at a previous point in time to be seen. It also makes transactions lighter, as we don't need to perform a full file backup: you merely record the offset of files being appended to.<br></div><div><br></div><div>A problem with naive append-only files is you need to scan the entire file to obtain the most recent state. But this can be rectified with either periodic "snapshots" in the data stream (like how revlogs periodically store a fulltext) or via separate cache files holding snapshot(s). The tags and branches caches kinda work this way: they effectively prevent full scans or expensive reads of changelog and/or manifest-based data.<br></div><div><br></div><div>A revlog may actually not be a bad solution to this problem space. A bit heavyweight. But the solution exists and should be relatively easy to integrate.<br></div></div></div></div>