[GSoC] Adding svn:externals support to hgsubversion

Daniel Tang dytang at cs.purdue.edu
Fri Apr 3 06:07:09 CDT 2009

On Fri, Apr 3, 2009 at 02:37, Peter Arrenbrecht
<peter.arrenbrecht at gmail.com> wrote:
> On Fri, Apr 3, 2009 at 12:00 AM, Daniel Tang <dytang at cs.purdue.edu> wrote:
>> On Thu, Apr 2, 2009 at 05:15, Dirkjan Ochtman <dirkjan at ochtman.nl> wrote:
>>> 2009/4/2 Daniel Tang <dytang at cs.purdue.edu>:
>>>> This is my first time posting to the list, but people who are in IRC
>>>> may recognize me as saiyr, since I began chatting recently. I talked
>>>> briefly to Augie about this idea, but am trying to present details
>>>> now. I realize this is kind of late in coming, but hopefully I can get
>>>> some feedback before proposals are due.
>>> It would be nice if something could be done here. My worry at this
>>> time is, what is the result of such a conversion? We should have
>>> subrepos in hg 1.3, can you target those? Their design isn't finished,
>>> though, so it's kind of hard to say how you're going to work from
>>> that.
>> The first part of this project can be done without any need for
>> sub-repositories. If the targeted release date for 1.3 is July 1,
>> there *should* be ample time to implement svn externals on top of hg
>> subrepos. Also, since I hear subrepos are still in the early stages of
>> development, perhaps I could participate in design/implementation.
>> There are really three tasks in my proposal, and the first two can
>> switch places (single directory imports and hg subrepos).
>> Any other comments are greatly appreciated.
> How about making the subset of the svn repo configurable using
> something like include/exclude rules? This would allow one to extract
> relevant subsets of a big svn repo. I have had to works with some
> hugely sprawling svn repos that prohibit being cloned with
> hgsubversion today. And I can imagine there are more of this since
> with svn's approach the sprawl is not big issue (and has its
> advantages).
> -Peter (parren)

For the sake of posterity: As it turns out, this has already been
resolved in issue 42 (can do hg svnclone --filemap). As Dirkjan
suggested, I've made my proposal more generic sounding in order to
accommodate alternatives in case my initial proposal doesn't fall into
the proper timeline with Mercurial sub-repositories. A new version is
attached, and the application is up-to-date on the GSoC website.

-------------- next part --------------
GSoC Proposal: General Enhancements for hgsubversion
This proposal currently encompasses two main goals:

  1. Adding support for svncloning a single directory
  2. Adding support for svn:externals

The first goal involves removing hgsubversion's assumption that the Subversion
repository always follows the trunk/tags/branches format. The second goal
involves allowing hgsubversion to interact with svn:externals properties, which
link to other repository locations, which typically do not have
trunk/tags/branches at the root; hence, the first goal is required to
accomplish the second.

Design Decisions
Single Directory Subversion Clones
The first step should not be conceptually that difficult. Currently,
hgsubversion makes assumptions about the structure. For example, it assumes a
trunk/tags/branches format for the Subversion repository in order to generate
hg tags/branches, as well as how the tags/branches are created in Subversion.
The ability to "import a single directory" is essentially removing the need for
these assumptions, which is more or less how Subversion acts. Subversion
doesn't enforce the notion of tags and branches; the assumed
trunk/tags/branches structure is conceived and managed by its users. Thus,
hgsubversion would essentially import a single branch.

One problem created by this is that without the assumed structure, some
alternative must be provided for pushing hg tags/branches to Subversion (issues
38 and 15, respectively). It's not a problem that has to be explicitly taken
into consideration for this proposal, but it will eventually need to be
accounted for.

Automatic detection of whether or not to assume the structured
trunk/tags/branches format over the "unstructured" format should be relatively
easy (checking for existence of said directories). Manual override should also
be allowed for cases such as issue 32, where the branches directory is
misspelled or simply in a different location.

Subversion Externals
Subversion has support for an "externals" property, which can be used in
several ways. First of all, it can point to a location in another Subversion
repository. Typically, this is a directory, but the ability to point to a
single file was introduced in Subversion 1.6, so this may need to be taken into
account (hgsubversion only requires bindings for 1.5, so perhaps not
immediately). The target can also be locked to a specific revision, so it
doesn't update whenever the host repository updates. URLs can also be relative
or absolute, but this is not terribly relevant to implementation details.
Essentially externals are a handy shortcut so developers don't all have to do
the checkouts themselves. This should transfer to Mercurial as well.

One design decision here that I can't really make alone is whether or not
externals should be local to hgsubversion or global to Mercurial. I believe
implementation would be similar in both cases. If externals were to be
implemented in Mercurial, then hgsubversion externals should be an easy
extension. Here are the things that I believe would need to be done:

  1. A new file (say, .hgexternals) that would reside in the root of the
repository. The requirements for the kinds of information this file would
contain are similar to Subversion externals. Though, only full repository
externals would be allowed, since Mercurial doesn't allow partial cloning in
this aspect. This file would also need to be taken into consideration for
ignores, since externals should be excluded from "hg status", etc.
  2. Update and pull commands must be modified. Update should have options to
update external repositories that aren't at a fixed revision and check that
externals at a fixed revision are indeed at that revision. Pull should have an
option to pull for external repositories as well. I don't believe push is
typically used, since usually people who use externals don't have commit access
to those repositories, but this could easily be extended to cover push.
  3. Clone should check for an .hgexternals file and clone any external
repositories that are in the file, if it exists.

Subversion externals, then, can easily be implemented on top of normal
externals. With the first goal accomplished, the externals can be svncloned and
appropriate data written to .hgexternals. If externals are deemed unnecessary
for Mercurial's core, the same process can be used to implement Subversion
externals in hgsubversion (or as another extension). Mercurial provides the
ability to wrap other commands, so a wrapper would be used to implement the
options above.

Other Tasks
This proposal is incomplete in the sense that it is meant to tackle general
problems/ideas with hgsubversion. Two concrete ones are listed above. However,
since sub-repositories are planned for Mercurial 1.3, it may be difficult to
implement svn:externals on top of them while they are still in development.

Another big issue in hgsubversion at the moment is the inability to import from
Subversion starting at a specific revision. In the event that timing does not
line up for svn:externals, this would be a viable alternative task to solve.

These three tasks cover the major enhancements on the hgsubversion issue
tracker, as well as one that isn't (externals). Since this proposal is meant to
be a general one to improve hgsubversion as much as possible, it may also
include ideas that mentors or others suggest.

Because of hgsubversion's current assumptions of Subversion repository layouts,
doing the first step of this proposal is a fairly major refactoring, which
means a fair amount of time needs to be spent analyzing the current structure
of the code and finding out the best way to implement the change.

I think the implementation of externals would be fairly straightforward, since
externals are essentially shortcuts for things that could be done manually.
Since I already know a little bit about how hgsubversion is implemented,
creating wrapper commands shouldn't be too difficult. Mercurial itself handles
revision of the externals, as Subversion would with its properties.

As already mentioned a few times, timing will play a role in whether or not
Subversion externals can be implemented in the GSoC timeline, since
sub-repositories are still in the design/pre-design phase, and will be for at
least several more weeks.

  1. Design/implementation of single directory svnclone
  2. Testing and documentation of single directory svnclone
  3. Design/implementation of externals
  4. Testing and documentation of externals

This milestone list assumes all goes well with timing. It is possible/likely
that this milestone list will not actually be used, but once a more concrete
list of ideas is established, a new set of milestones can be established. Given
that native sub-repositories are planned for Mercurial, a fair part of the work
involved in externals is taken care of, so it is also likely that more tasks
will be included (such as cloning from a certain revision).

About Me
I'm a first year PhD student in Computer Science at Purdue University (Indiana,
USA). I got my BS in the same degree at the same place. I don't have much
large-scale programming experience with Python, but I know the language fairly
well and I teach it to science majors. I did work for awhile on shaim
(http://shaim.net), an open source, multi-protocol IM client written in C#/WPF,
but this is my only notable open source involvement (more than a few patches).
Other notable programming experience includes an internship at Microsoft
working on the Visual C# team.

Before GSoC season came around, I was writing a Mercurial backend for
django-rcsfield (http://code.google.com/p/django-rcsfield/), which is basically
a model field for Django that uses a version control system to track its
history. I never completed it (the project is basically superseded by audit
trails), but it's when I started digging into Mercurial source.

Since GSoC season started, I've explored various parts of Mercurial and read
some of the wiki docs to help me understand it. I also wrote a simple patch for
issue 60 (listing distinct authors in a Subversion repository) to make me
explore hgsubversion internals more. My goal is to tackle issue 55 to accompany
my application, which involves processing Subversion URLs that have credentials
in them, e.g., http://user:pass@host.com/svn

More information about the Mercurial-devel mailing list