Help with extension writing

Matthew Watson mattw.watson at gmail.com
Wed Apr 14 03:08:40 CDT 2010


Hi Greg,

I've got a bit more time and can be a bit more specific...

I'm a senior dev at Atlassian and we are looking at how we could add
mercurial support to FishEye
(http://www.atlassian.com/software/fisheye/) and Crucible.

While Fisheye deals with commits, it presents the repo from a file
perspective as well, so finding information about files (like which
rev of a file is in a tag, what is the last modified rev of a file on
a branch, tracking copies and moves) is required information for us.
We have an existing data model (that supports CVS, SVN, p4, Clearcase
and git) that we transform this data into.

Let me know if you think I'm going wrong anywhere...

We started off just calling the command line, but this resulted in
lots of calls to hg/python, so we figured an extension would be more
performant and have access to more information than we can easily get
from the command line.

We're also trying to make this is low maintenance as possible, so
trying to stay away from internals that might change over time as much
as possible (we have no control over what version of mercurial our
clients will use, but can recommend  versions that our extension will
work with) - I see that there is a commitment to change the command
line as little as possible, so figured the comands.py functions would
be good.

So for every commit we see, we want details about that commit (basic
stuff from changectx), details about the files that have been modified
in that commit (this may be counter to mercurials backend model),
files added, deleted, copied, moved, modified (which we get ATM by
looking at the diff --git output), their size (in the filectx), their
content (hg cat) and their ancestral parent(s).

Now it seems that the model we are trying to build may not be the same
as what mercurial presents because of the merge tracking it does and
the historical evolution of our model. To find a parent rev of a file
(the last commit that it changed in), we were doing:

commands.log(ui, repo, file, date=None, user=None, limit=1,
only_branch=branch, rev=["%s:0000000000000000000000000000000000000000"
% parent.hex()], template=format)

But it looks like this will not find merged in parents of a file, as
log looks for the presence of the file in the changectx.files() for
the given branch and a file, if merged from another branch, will not
be listed in there (correct?). Instead I'd need to navigate the merge
branches as well. Is there a better way to do this?

Or, can I look through the file log to find this out in a different way
* find previous revisions of the file and see which changectx they appear in?
* Look for the file in the manifest and look back through parent
changectx's to see when the nodeid of the file in the manifest
changes?

I guess for me, the data I'm getting is correct for straight single
parent commits, but merge commits seem to amalgamate the data from the
2 parents in ways I don't yet understand - I was assuming there was a
"primary" parent (ctx.parents()[0]) and that all the merged in changes
would appear in a changeset for that changectx, which is what diff
--git shows you when you diff ctx to ctx.parents()[0], but is not what
is represented necessarily in the ctx.files() for that changectx? I've
gone through a lot of the wiki, the docs, the Definitive Guide etc,
but it's hard to get a clear picture.

We are also trying to get clear if tying a file revision to the
changectx it occurred in is a bad idea and instead we should be using
the actual file revision as shown by "hg manifest --debug" or
node.hex(ctx.manifest()[file])

Sorry if this is not clear (and there are a lot of different questions
in here!), my understanding of the mercurial data is evolving daily
and I'm trying to fit it to a different model that we already have (or
decide if that is not possible!)

Thanks
Matt

On 14 April 2010 07:27, Greg Ward <greg-hg at gerg.ca> wrote:
> On Tue, Apr 13, 2010 at 8:07 AM, Matthew Watson <mattw.watson at gmail.com> wrote:
>> I'm pretty new to Mercurial and not a python guru by any stretch, but I've
>> been trying to write and extension to get some information out of a hg repo
>> that we can put into an external database.
>
> *What* information do you want precisely?  And why?
>
>> I've started off with http://mercurial.selenic.com/wiki/MercurialApi and
>> defined an extension that pulls some info out of a changectx, then calls
>> commands.diff(ui, repo, rev="%s:%s" % (ctx.parent.hash(), ctx.hash()),
>> git=1) and for each file we see in the output, print some extra info
>
> 1) You generally don't want to call functions in commands directly --
> they're usually a bit too high-level to be convenient.  The usual
> procedure is to read the source of the command that you want to use,
> and figure out which lower-level code you really want.  Quite often,
> the lower-level code is either a function in cmdutil.py, a function in
> hg.py, a method of changectx, or a method of localrepository.
>
> 2) In this case, calling commands.diff() (or patch.diff(), the lower
> level code in this case) is useful if you want the diff between two
> changesets -- e.g. ctx and its first parent.  But if you want some
> *other* info, like the list of files changed in a given changeset,
> this is not the right way to get it.
>
> For example, if you want the list of files changed in a particular
> changeset, there's a changectx method for that:
>
>  cctx = repo[revid]
>  for file in cctx.files():
>      ...
>
> (As mentioned in section 5 of the MercurialApi wiki page.)
>
>> What I'd really like to get hold of is the current file revision and the
>> preceeding file revision in the DAG (revisions if it's a merge commit).
>> I can get the current file rev by calling node.hex(filectx.filenode()), but
>> I have no idea how to get the preceeding revision(s).
>
> Why do you want file revisions?  That's a pretty arcane feature of
> Mercurial -- it's not actually exposed in the UI (except via "hg
> manifest --debug"), so is not useful to anyone using the command-line
> interface.
>
> And what precisely do you mean by "preceding revision(s)"?  Are you
> talking about the previous revision of this one file?  Or the first
> parent of the changeset in question?
>
>> I have been calling commands.log(ui, repo, file, limit=1, follow=1,
>> branch=ctx.branch) to get the changectx it is in, but this seems
>> inefficient.
>
> Yes, very.
>
> I think you need to back up a step and tell us what you're really
> trying to accomplish, rather than precisely what bits of data you are
> trying to extract from Mercurial's API.  It sounds to me like you
> might be going about it -- whatever it is -- in an unnecessarily
> difficult or obscure way.
>
> Greg
>


More information about the Mercurial-devel mailing list