<html>

  <head>

    <meta content="text/html; charset=utf-8" http-equiv="Content-Type">

  </head>

  <body bgcolor="#FFFFFF" text="#000000">

    <br>

    <br>

    <div class="moz-cite-prefix">On 3/10/16 8:27 PM, Gregory Szorc

      wrote:<br>

    </div>

    <blockquote

cite="mid:CAKQoGa=SRn5s566geDouV1BF0K=1C_CLfQVyawOm7u_+QxMTeA@mail.gmail.com"

      type="cite">

      <meta http-equiv="Content-Type" content="text/html; charset=utf-8">

      <div dir="ltr">

        <div class="gmail_extra">

          <div class="gmail_quote">On Thu, Mar 10, 2016 at 5:46 PM,

            Durham Goode <span dir="ltr"><<a moz-do-not-send="true"

                href="mailto:durham@fb.com" target="_blank">durham@fb.com</a>></span>

            wrote:<br>

            <blockquote class="gmail_quote" style="margin:0px 0px 0px

              0.8ex;border-left:1px solid

              rgb(204,204,204);padding-left:1ex">We're going to be

              investigating alternative client side storage strategies

              for Mercurial at Facebook over the next few months. We've

              already moved off revlogs for our filelog storage (via

              remotefilelog), and will likely need to avoid revlogs when

              we move to tree manifests as well.<br>

              <br>

              As part of this, I've put together a design doc describing

              a high level idea that would let us experiment with

              different storage backends, and provides a path for

              migrating existing users over. It's currently focused on

              situations like ours, where you have parts of the

              repository on a central server and parts on the client,

              but the overall design may be of interest to the

              community.<br>

              <br>

              <a moz-do-not-send="true"

                href="https://quip.com/TFR2Aw0lu0LB" rel="noreferrer"

                target="_blank">https://quip.com/TFR2Aw0lu0LB</a><br>

              <br>

              It's a bit light on concrete format details, since the

              main goal is to put abstractions in place that would let

              us break away from the existing formats and experiment.<br>

              <br>

              Feel free to comment on the doc (you have to sign in to

              Quip to comment), or respond by email.<br>

            </blockquote>

            <div><br>

            </div>

            <div>Interesting read. My knee jerk takeaway is it is

              similar to my PackedRepoPlan but with a formal separate

              metadata store. </div>

          </div>

        </div>

      </div>

    </blockquote>

    I think the PackedRepo plan is a potential implementation of the

    long term storage I mention. The overall plan encompasses other

    things like: the removal of the concept of rev numbers, a

    potentially storage agnostic concept of repacking, a division of

    server fetched and local created data (and implying a way of

    fetching them), and the metadata separation you mention.<br>

    <blockquote

cite="mid:CAKQoGa=SRn5s566geDouV1BF0K=1C_CLfQVyawOm7u_+QxMTeA@mail.gmail.com"

      type="cite">

      <div dir="ltr">

        <div class="gmail_extra">

          <div class="gmail_quote">

            <div>However, you say in this document that "This means the

              metadata store could be recomputed from the content store

              if necessary." I translate this to mean the metadata store

              is a glorified cache. When you view the metadata store as

              a cache, the high-level proposals in this doc are

              compatible with ideas in PackedRepoPlan.<br>

            </div>

          </div>

        </div>

      </div>

    </blockquote>

    It's not quite a cache.  A big reason to separate the metadata is so

    we can store the metadata without storing the revision content at

    all.<br>

    <blockquote

cite="mid:CAKQoGa=SRn5s566geDouV1BF0K=1C_CLfQVyawOm7u_+QxMTeA@mail.gmail.com"

      type="cite">

      <div dir="ltr">

        <div class="gmail_extra">

          <div class="gmail_quote">

            <div><br>

            </div>

            <div>Could I bother you to add a section on transactions and

              reader/writer consistency to the document? As it stands, I

              have some questions on how you'll work consistent

              views/snapshots into this document. Specifically,

              performing transactions across multiple systems (such as

              SQLite) could be problematic. The scenario I'm envisioning

              is one where the content and metadata stores are in

              separate stores that both support transactions. You can

              commit a transaction in 1 but not the other. How do you

              roll back the first committed transaction and/or provide a

              consistent reader view over the union of the 2 stores?

              With SQLite, you can make a copy of the self-contained

              database file and move it back to restore previous state.

              But with most hosted stores (like MySQL), I don't think

              you can roll back the last transaction once it has been

              committed. I think very few stores will support this

              property (basically limited to things running on a POSIX

              filesystem). If very few stores support that property,

              does a high-level abstraction of content and metadata

              stores make sense or should we be focusing on a concrete

              implementation [that relies on POSIX filesystem

              semantics]?<br>

            </div>

          </div>

        </div>

      </div>

    </blockquote>

    This is one area where we'll likely deviate from the normal

    Mercurial patterns.  I anticipate our store being explicitly append

    only (in terms of only adding data, not in terms of appending to an

    end of a file).  It will be entirely content addressed, so any data

    that is added can be permanent (or cleaned up in a rare mark-sweep

    gc-style clean up). So there will be no rollback, and no need for

    rollback.<br>

    <br>

    When data is added, I imagine it being done either in a third-party

    transacted way (ala sqlite, where many writers can be coordinated)

    or in a posix way (ala git, where many writers just operate on

    different files, and compaction happens by reading immutable files

    and producing immutable files).  In this way, we could allow

    multiple writers at once.<br>

    <br>

    If we need to support something like the changelog, where there

    needs to be the concept of deleting branches of the tree, then we'll

    create a separate store of branch heads, and treat them like git

    refs (where the branch refs determine what parts of the key/value

    store are accessible). Then the concept of a consistent view just

    boils down to ensuring we've written data to the append only store

    before we update the branch heads list.<br>

    <br>

    Does that answer your question? If so, I can add it to the document.<br>

  </body>

</html>