D1358: remotenames: store journal entry for bookmarks if journal is loaded

Thu Nov 16 19:56:28 EST 2017

pulkit (Pulkit Goyal) <phabricator at mercurial-scm.org> writes:

> pulkit added a comment.
>
>
>   In https://phab.mercurial-scm.org/D1358#23866, @smf wrote:
>
>   > pulkit (Pulkit Goyal) <phabricator at mercurial-scm.org> writes:
>   >
>   > > pulkit added a comment.
>   > >
>   > >   In https://phab.mercurial-scm.org/D1358#23388, @smf wrote:
>   > >
>   > >   > > durin42 added a comment.
>   > >   > >
>   > >   > >   I don't think so. IMO the storage-only parts can and should go straight in core and be on by default. Exposing the data (by default, at least) we're collecting should probably go through an extension rodeo first.
>   > >   >
>   > >   > I agree with putting data generating functions and storage into core.
>   > >   >  Though, I disagree with the approach shown so far.
>   > >   >
>   > >   > We currently have at least two extensions in core that do something
>   > >   >  similar: blackbox and journal. Including remotenames, all three have
>   > >   >  overlapping data storage needs. The storage design of each is roughly:
>   > >   >
>   > >   > - blackbox logs each command
>   > >   > - journal logs update / move commands
>   > >   > - remotenames logs exchange commands
>   > >   >
>   > >   >   For something to make it to core, I would expect a unified way to deal with this duplication.
>   > >
>   > >
>   > >   While I agree with you that we should have a unified way to deal with any kind of duplication, I am unable to understand the duplication between data stored by remotanemes and one stored by journal and blackbox.
>   > >
>   > >   Data stored for journal, blackbox and remotenames are as follows:
>   > >
>   > >   Journal: time, user, namespace, name of changed item, command, nodes(old and new)
>   > >   Blackbox: date, user, node, process id, source and some kind of formatter message
>   > >   Remotenames: node, remotepath, name of bookmark or branch
>   >
>   > Why not union them as such:
>   >
>   > datetime, user, namespace, node, process id, args (name of changed item,
>   >  remotepath, etc), input (old nodes), result of command (new nodes, etc)
>
>
>
>
>   1. remotenames don't need datetime, user, process id, result of commands and the etc's.

Just because we don't use them now doesn't mean we won't in future (in
fact, I could easily see a use-case for seeing the previous locations of
a remote repo by date or even user).

>   2. if we store all the journal data, blackbox data and remotenames data all at one place, we will certainly need one more field which will store whether it's remotenames data, journal or blackbox as otherwise it will be slow to read all the data and find the data we need.

Perhaps but not necessarily. We could just query, for instance, all the
times we've pulled or pushed. The resulting query set would have the
same-ish data as remotenames has now.

>   >
>   >
>   >>   I strongly feel that data storage for journal and blackbox should be unified, I am not sure how these overlaps with remotenames one. I want to keep remotenames stored separately as information about a remoterepo can be used in operations like push and pull. @durin42 and others had some ideas around it. Moreover journal and blackbox data serves different purpose than that of remotenames.
>   >
>   > I don't really see them as different purposes. Questions like, "What was
>   >  the state of my repo yesterday (journal)?" or "What was the state of
>   >  remote yesterday (remotenames, though future features that is append
>   >  only)?" or "What were the last five commands and their input
>   >  (blackbox)?"
>
>   Sorry Sean, but I do see them as different purposes because of the following points:
>
>   1. This information is related to a remote repo. It will be very good if we have storage of remote repo information and local repo information separately.

Perhaps. But that sounds more like an implementation detail than a
logging data design.

>   2. remotenames will be a part of the UI (commands like log etc.)

Sure, but so could journal and others?

>   3. There are revsets related to remotenames

Are you asking points (2) and (3) due to performance reasons?

>   4. remotenames will have new namespaces

Only when reading the data which will still live in an extension for now.

>   5. There is lot of data stored by journal and blackbox which is not required for remotenames.

I mean, sort of. After using remotenames for so long, it would have been
nice if I stored the datetime, proc, command, etc because I found myself
needing that information sometimes.

>   Even we get to a stage where we pulls the journal or blackbox information from a remote repo and store them locally, we should store them atleast in different files as that of local journal and blackbox information.

My main point in this thread is that this information (journal,
blackbox, remotenames, etc) should be, at least from core's perspective,
agnostic of which extension wants that data. By writing code with only
one extension in mind, we end up in a place where extensions may have
(unwritten) dependencies. I ran into this often with hg-git,
hgsubversion, and remotenames.

Keeping the data in a centralized place through a common api (e.g.
something akin to namespace) we avoid having extensions step on each
other's toes.

>   If we want to unify the storage for remote and local things, then we should think about having a unified storage for bookmarks(local+remote) and branches(local+remote).

That's not a bad idea, though a bit more specialized than a data store.
I have long wanted a generic cache object so that extensions didn't have
to implement their own.

Journal seems to have a lot of similar code to remotenames. In fact,
probably even better code because I see it has an actual storage class.
We could start by abstracting out journalstorage and friends to another
module and then importing that into remotenames for use as a storage
layer.

Restricting to just journal and remotenames seems easier for now.
Something like this is what I have in mind:

1) abstract out storage of journal (hgext/extstorage.py or something)
2) use new module in remotenames
3) iterate on usage
4) graduate extstorage to core

I might be able to take a crack at (1) this weekend.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 832 bytes
Desc: not available
URL: <http://www.mercurial-scm.org/pipermail/mercurial-devel/attachments/20171116/8916d776/attachment.sig>