RFC new template keywords (wcmodified,wcmodifieddate) + sample impl.

Peter Bray peter.darren.bray at gmail.com
Tue May 31 01:24:47 CDT 2011


Matt,

Many thanks for the feedback and additional contexts. That is why I 
submitted this as a RFC rather than a patch.

On 31/05/11 01:22 AM, Matt Mackall wrote:
> On Sun, 2011-05-29 at 16:32 +1000, Peter Bray wrote:
>> Greetings,
>>
>> After over three years using Mercurial on half-a-dozen personal
>> repositories I'm looking at including Mercurial version information in
>> automated builds for larger projects, a la wiki/VersioningWithMake.
>> The first project is XML-based documentation with C++ projects to follow.
>>
>> This has lead to the following observations (not criticisms) / questions
>> which my reading has not provided me with decent solutions for:
>>
>> - Programmatically determining if the working copy is modified.
>>
>>     I have not discovered a simple way to determine if the working
>>     copy of a repository has been modified. Think shell code like:
>>     "if hg modified; then ... ", there seem to be many ways to get
>>     the information, hg st | wc -l, looking for the + in hg id, etc.
>>     Have I missed something obvious?
>
> Both hg id and hg st will miss out on two classes of change:
>
> - modified subrepos
> - changed branch
>
> 'hg summary' will notice these on the commit: line.

Great Point.

> You seem to be looking for an exit code based method. I don't think we
> have anything like that but I can imagine adding it to summary.

That would be an ideal spot. I actually started with "hg summary", but
it did not have a --template argument, and I was trying to avoid the
shell based (ie grep|cut|awk|perl|python|...) text hacking, as it can
be fragile, and I've been forewarned that Windows builds of some
components are a possibility later this year. This lead to the
thought, that if only "hg summary" had --template option, I could
generate the code fragments I needed (XML, C++, ...) via templates and
not have to do text hacking. Looking at commands.py as a non-python
programmer and non-mercurial developer, I thought a patch which adds
--template to "hg summary" was a little far for a first foray (and
quickly forgot that idea), but I wanted to get started and to
contribute something concrete not just a wish list.

So I'll back step a bit and give some context to my submission and
hopefully you or the community can see where I'm going and why, and
suggest a more appropriate strategy for me to investigate or tell
me it's not an appropriate line of investigation.

I'm working on better build automation for use by both developers
(relatively inexperienced and currently using an old centralised
commercial offering) and the system builds, and I'd like the capture
more completely what is being built (ie I'm trying to implement CM
best practices of identifiability of all products in the system).

In wiki/VersioningWithMake, it suggests that "hg parents --template
'hgid: {node|short}'" is a way to get a revision identifier for a
build. This does not take into account, that what is being built (ie
the working copy) is modified. Now "hg id --id" does give you that
detail, in the form of the "+", but does not support --template, thus
back to text hacking or shell based command output substitution.

As you can probably see, I think the templating facility is very
powerful, portable and elegant solution for generating revision
identifying code fragments. Hence the wander down that garden path, it
also seemed coding-wise to be a gentler path into learning more about
the Mercurial code base.

>> - Templates support for determining if the working copy is modified.
>>
>>     In templates, I can't see a way to determine if the working copy
>>     has been modified. With "hg id -i" the plus (+) is not optional
>>     ("hg parents --template '{node|short}'" is the alternate). While
>>     in templates the plus or some indication that the working copy is
>>     modified does not seem to be available.
>
> Indeed, none of the commands that take templates actually pay any
> attention to the working directory in their display and I don't think
> it's ever occurred to anyone that they should. So this message is all a
> little weird to me.

I suppose my point is that builds are done on working copies not
repositories, and changes to the working copy (probably) need to be
summarised in the build revision information. That information is
available in Mercurial, wouldn't it be nice to have a portable elegant
way to extract it for the build process, that is flexible enough to
handle whatever language / build system is in use.

> It seems you've got a multi-line shell expression that you'd like to
> reduce to a single line template expression at the cost of adding a
> chunk of code and several features to Mercurial. That's only a win if
> those features are of general utility.

I agree totally. But if these features (appropriately designed and I'm
not saying I've achieved that), it could greatly simplify revision
identification in build processes, and stop numerous developers and
CMs developing there own external scripts or command lines to extract
and transform the output of existing Mercurial commands into something
they can use. Simplifying build systems :-)

I suppose as I see it, and it is just one opinion, it would be great
if the information available in "hg id", "hg summary", "hg root" and
possibly others were available in the powerful and flexible template
infrastructure in Mercurial. Maybe only available in context of "hg
summary" subcommand if the keywords would not be appropriate
generally.

>> - Programmatically determining when the working copy was last modified.
>>
>>     Once determined that the working copy is modified, it seems to me,
>>     that I need some basic way to identify that "revision" in an
>>     automated build system. Coding in the current date and time, will
>>     have the build regenerate the version information on each build,
>>     even though nothing has changed (e.g. make; make - rebuilds things
>>     unnecessarily).
>>     While there is no complete way to determine a "revision" identifier,
>>     for a modified working copy (generating a hash, that never appears
>>     in the history of the project seems pointless), I thought that MAYBE
>>     determining the date of the last quantifiable change might be a
>>     reasonable stand-in. This only works for additions and modifications,
>>     not deletions and removals, but as the former are probably more common
>>     and it may suffice to use the timestamp on the most recently changed
>>     file, for this propose.
>>     The following shell shows my first attempt (g prefix =>  GNU version):
>>     hg status -0 -n -q -am \
>>      | ( cd `hg root`; gxargs -0 --no-run-if-empty gstat --format '%y' ) \
>>      | sort -n \
>>      | tail -1 \
>>      | perl -p -e 's/:\d\d\.\d+//' # Remove excess precision (cf isodate)
>
> I guess that's slightly more meaningful than +, but I don't see why it's
> better than simply using 'time of build'?

I suppose I have always developed build systems where the re-running of
make does not rebuild anything if nothing has changed (the "make;
make" comment above). So the inclusion of a raw timestamp would force
a rebuild, so no build is reproducible.

The python implementation of the above UNIX command line is arguably
more readable and definitely more portable, and is of course hidden
behind a user-understandable (and documented) keyword.

 > I can't see us doing something like this internally.

Fair enough. It is a request for comment.

>> - A new template keyword, "wcmodified", a representation of the boolean
>>     value of whether the working copy has been modified, with the same
>>     logic as the "+" modifier in "hg id".
>>
>>     Possible Modifications:
>>       - Name: what would convey the best meaning?
>>       - Representation: The value is boolean, but what is the best way
>>         to represent that in a templating environment and what filters
>>         might then be appropriate.
>>         The strings "True" and "False" are great in textual environments,
>>         but what about using a template to generate a C code fragment?
>>         The integers 0 and 1 plus filters (like say "bool" and "plus")
>>         might provide more flexibility. ("bool" being 0:"False", 1:"True",
>>         and "plus" being 0:"", 1:"+" to allow<node>+ generation in a
>>         template - eg {node|short}{wcmodified|plus}). I think a filter
>>         like "int" (if the value is boolean), might be abused on non
>>         boolean keywords like node and cause overflow issues.
>>
>>     Initial Implementation: Using integers are external representation
>>
>> def showworkingcopymodified(repo, ctx, templ, cache, revcache, **args):
>>       """:wcmodified: Integer(0|1). Is the working copy of the current
>>       repository modified? Use the filter bool to convert to "True" or
>>       "False"."""
>>       # Using repo.status() defaults on listsubrepos, ignored, unknown, ...
>>       changed = util.any(repo.status())
>>       return int(changed)
>>
>>      and of course,
>>
>>       'wcmodified': showworkingcopymodified,
>
> What does this do when passed to 'hg log'? I think it needs to check
> that ctx is a working directory parent.

Good point, I'll look into the APIs to see how this might be done.

>> - A new template keyword, "wcmodifieddate", a date compatible with
>>     the existing date filters (eg isodate) that represents the time of
>>     last modification of the set of files that have been modified since
>>     the last commit. With the same limitations as mentioned above.
>>     Since the date filters will provided the current date and time
>>     (which varies on each run) when provided with None or "", I think it
>>     would be best to default to the change context date, when there are
>>     no changes or the date can not to determined (delete/remove).
>>
>>     Possible Modifications:
>>       - Name: what would convey the best meaning?
>>       - Representation: Other ways to represent edge cases?
>>

Maybe I should have included the XML I was toying with while
developing this prototype implementation. Just to show how elegant and
portable a template based solution would be (cf UNIX specific command
lines I had developed, you all know the type ;-) ).

hg parents --template \
'<versioninfo
   mode="Mercurial"
   repositoryroot="{root}"
   repositoryid="{node|short}"
   repositoryrev="{rev}"
   repositorydate="{date|isodate}"
   repositorymodified="{wcmodified|bool}"
   repositorymodifieddate="{wcmodifieddate|isodate}"
   repositorytags="{tags}"
   repositorybranch="{branch}"
   repositorylatesttag="{latesttag}"
   repositorylatesttagdistance="{latesttagdistance}"
   />
'

The new keywords here are "wcmodified", "wcmodifieddate" and "root"
(new since the original posting) and the new filter is "bool".

The current implementation is invariant across multiple runs (assuming
no further modification to the working copy - adds and modifications),
and thus can be compared to the previously generated version and thus
"make" will not unnecessarily rebuild the derived targets.


I suppose my argument is that extending Mercurial itself, with these
or similar concepts (not necessarily this implementation) benefits all
those integrating mercurial into build systems, by providing access to
information in flexible and platform-independent way.


IN SUMMARY:

   Would the following proposals be worth pursuing:

    - "hg summary" supporting --template, with additional keywords
       and filters that maybe repository or working copy specific,
       such as "modified", "root" and _maybe_ "modifieddate" or
       something like the text associated with "commit:"

    - "hg summary" command line option to support just generating
      an exit status based on the data collected for the "commit:"
      line. Alternatively, a new subcommand (eg "hg modified")

   Thoughts most welcome! It's very much a request for comments!

Regards,

Peter
(Sydney, Australia)




More information about the Mercurial-devel mailing list