JustinPeng

Email: justin dot peng dot sw at gmail dot com

I'm interested in the idea below, and think it will improve Mercurial's UE.

3.7. Interactive patch selection for commit/Mercurial Queues/record/import

Being able to select parts of the existing changes, with hunk or greater granularity, in an interactive way, can improve the use of commands and extensions that take changes, such as commit, MqExtension(MercurialQueues)and import. The RecordExtension currently allows patch hunk selection, but sometimes a better granularity is desired, as when a set of adjacent function definitions should go in different commits. This feature could be added as an --interactive mode for many of Mercurial's core commands.

Here's my application:

For commands and extensions that take changes, such as commit, MqExtension(MercurialQueues)and import, it's better to be able to select parts of the existing changes, with hierarchical granularities, in an interactive way. A uniform API will be brought in to cut down redundancy.


1. What project do I want to tackle?
Interactive patch selection for commit/Mercurial Queues/record/import.

2. What design choices will have to be made?
2.1 The key to this project is to define better granularity. IMO, the problem includes a set of granularities but not simply only one optimal granularity in fact.  
The set of granularities can be considered as a tree whose root is just 'all changes of a project', and be arranged in hierarchy as:
  changes of a project
   |
   |---- changes of a folder
          |
          |---- changes of a module/file
                    |
                    |---- changes of a hunk
                             |
                             |---- changes of a line
 
As the figure shows, line changes in a hunk are the subtlest. After a simple requirement survey in mailing list, we are sure that, being able to choose at the line level granularity may be good enough for most users and it’s efficient enough to process any line_based text file, such as source code in any language, scripts or documents. In this way, adjacent elements in a hunk, such as functions or classes in a same hunk, can go in different commits, certainly in an interactive way.
 
2.2 At any granularity, users can do 4 things:
1.) Choose the current change unit.
2.) Skip the current change unit.
3.) Choose all change units left in the parent unit.
4.) Skip all change units left in the parent unit of current change unit, which leads to the next parent unit.
While the 3.) and 4.) can be extended recursively.

2.3 The API to extract
It’s pointed out by some experience developers that, many commands and extensions have the similar requirement on interactive change selection, such as record, crecord, tortoisehg, qrecord and some GUI tools. It’s better to support element (dir/patch/hunk/...) change for all of them by providing a uniform API. With the API, the duplicated code existing in the commands and extensions can be cut down greatly. It’s a significant work.

2.4 The flow of work.
At present, 'RecordExtension[1]' provided by Bryan O'Sullivan has shown a framework for interactive patch selection. It works following these steps:
1.) Get all changes to process
2.) Get a filtered patch in an interactive selection.
3.) Backup changed files, so we can restore them in the end.
4.) Clean the repo to the original state.
5.) Apply the chosen changes to the working directory.
6.) Do actual commit.
7.) Finally restore backed-up files
I think this workflow can perform well, and I plan to adopt it in my future work, for extension on many of Mercurial's core commands. 
 
3. What difficulties do I foresee? How do I plan to manage them?
As the flow of work laying in RecordExtension, there is a wrong point about atomicity. In the Step 7.) in the second part, back-up files are restored from a back-up directory (.hg/back-up/ currently) depending on a ‘finally’ block, however, if accident occurs, for example, Mercurial process is destroyed, the files will lose their states.  
There may be two way to guarantee atomicity. The simplest way is to work in memory, then the working directory is certainly clean. It’s advised to considering memctx(). The other way is more clumsier, just to check when hg is working, if a corruption is found, it will be restored. I will investigate to choose a better one.
 
4. What intermediate milestones can be defined?
1.) Requirement Spec
Further requirement survey is still needed. It comes from discussions in mailing list and IRC focusing on a.) which commands are supported and b.) which way to interact with users. It will be provided in middle May.
2.)    Test cases
It comes from clear requirement Spec, and will be provided in late May.
3.) Design Spec
It will contain the work flow and core data structures, and will be provided in Middle June.
4.)    Final patch
It will be provided in Middle August. In development, I will keep in touch with community, and react frequently.
 
5. Who am I? 
I’m a Chinese master majoring in Wireless Sensor Networks. I’m familiar with java, c and python. I like to hack in python; however I don’t have any big project experience on Python ago, so I’m eager to work on Mercurial.  

5.1 My project experiences:
The year before last I developed a hand phone map client on j2me when I acted as an intern in a company. It runs well on Nokia products (E50 and 6300 are my testing model). You can use it for browsing a map for a city, searching POIs such as restaurants then getting the deep info and searching bus lines. I played the main role of this project. In this project, I developed a log tool for j2me cell phones and it simplified our latter works greatly. In a considerable part of my work, I tried different strategies to exploit the ability of the connection limited devices, such as compressing and decompressing the data of maps and POIs, and bringing in a sliding window to improve the read efficiency. I’m proud that it’s the fastest-browsing map client I have seen, much faster than Google Map and Nokia Map. From this project I have harvested a lot, including SVN, unit testing, agile development (especially Test-Driven Development) and some knowledge about Geocoding. 

5.2 What have I done on Mercurial. 
1.) Subscribed the mailing lists, and walked on IRC to get familiar with the community. Here’s some communication between the community and myself [2].
2.) Checked out the source code, built and did some hacking with the instructions on wiki.
3.) Printed and covered half of hgbook, it helped me greatly know how Mercurial is designed and works.
4.) Covered most of wiki pages under the ‘DeveloperInfo’.
5.) Do a requirement survey in mailing-list, and received lots of good advices [3].
6.) Found a bug on date formats with '>' or '<' accompanied by space characters, provided a patch for it. mpm replied ‘Queued, thanks’. How Wonderful [4]! 

6. Appendix 
For more and newest progress I made, please visit my wiki page [5].
 
[1.] http://www.selenic.com/mercurial/wiki/index.cgi/RecordExtension
[2.]http://mercurial.markmail.org/search/?q=Justin+Peng#query:Justin%20Peng+page:1+mid:uqej4mqheiyw2ycx+state:results
[3.]http://markmail.org/search/?q=%5BGSOC%5D+Interactive+patch+selection+for+commit%2FMercurial+Queues%2Frecord%2Fimport
[4.]http://markmail.org/search/?q=Re%3A+A+patch+for+a+bug+on+date+formats+with+%27%3E%27+or+%27%3C%27+accompanied+by+space+characters
[5.] http://www.selenic.com/mercurial/wiki/index.cgi/JustinPeng

New progress:

After submitting my application, I have received warm encourage from Leslie, who said 'I'd like to endorse Justin's proposal for a granularity system and extensions/core working with it. It's an important feature that I miss every other day. [1]' His word inspired me greatly!

Besies RecordExtension, I have experienced 2 more simliar changeslection tools: CRecord and TortoiseHg,  which are based on text-gui and GUI respectively and provide the similiar function. However, I think a CUI-based command is neccessary for better portability, i.e. we hope it works well on wcurses. And in GSOC, my work will focus on:
1.  providing securer state restoring.
2.  providing a uniform API for commands or extensions taking changes. This will reduce a lot of rebundant code in existing extensions, such as record, crecord, TortoiseHg and maybe others.
3.  Use the API to extend command set  supporting change selection, such as the core commands import and export.  

Also, advice from Rocco suggests it's better to provide a last-chance edit option. I enjoy this advice, and will adopt it in my future design.

[1] http://mercurial.markmail.org/search/?q=Summer+of+Code%3A+Justin+Peng#query:Summer%20of%20Code%3A%20Justin%20Peng+page:1+mid:edajtkeysoczvho7+state:results


CategoryHomepage CategoryGsoc

JustinPeng (last edited 2010-10-22 18:19:47 by mpm)