/!\ This page is no longer relevant but is kept for historical purposes.

#pragma section-numbers 2

Flow control

aka "how to keep mpm from going insane."

https://docs.google.com/spreadsheets/d/1lEvdGoN-jgLMZ_1eu8ceXDzEdWGWuDTzPSDVCz4jtxs/embed/oimg?id=1lEvdGoN-jgLMZ_1eu8ceXDzEdWGWuDTzPSDVCz4jtxs&oid=432724482&zx=wd3y4sm1ozzx (data cover a couple of weeks)

List of patches in flight

What is flow control and why is it important?

From a theoretical perspective, if you have a queue where items arrive at a random rate and depart at an equal random rate, the queue will not simply stay empty. It will instead steadily grow without limit. And as the queue grows, the wait time for each item will also grow. That is to say throughput remains the same, but latency increases. This is very undesirable for most communication systems.

In fact, the ideal state is for the queue to be empty, which minimizes latency. We can get to a state where the queue regularly becomes empty simply by throttling the arrival rate to just below the departure rate, while still preserving most of the throughput. This process of avoiding queue growth and eventual overflow is called "flow control" and is an essential component of just about all communication protocols.

The situation becomes more complicated when the queue is receiving events from different sources. Now the issue of 'fairness' comes into play: ideally a source that is using less than its fair share of throughput and is limiting its flow to keep overall latency low will not be penalized when another source uses more than its share.

Also note that it is now not sufficient for each source to simply wait for acknowledgment of their previous item to guarantee that the queue doesn't grow, as there may be a sudden rush of items or other sources may be misbehaving. Senders have to examine or infer the global state.

Now consider a less theoretical case: patches in my inbox. This is like the above discussion, but with the important difference that the throughput actually decreases as the queue length increases! So not only does latency suffer, throughput does as well. As my throughput is already a bottleneck for the Mercurial development process, avoiding queue growth is critical.

When I've got only one patch in my inbox, it's really easy for me to know that it's not obsoleted by a newer patch and that I'm current on any related discussion. But as my inbox grows, the time it takes me to figure out what's going on for any given patch increases greatly. This gets especially challenging when I've got many patches with similar subjects from the same people: it becomes a maze of twisty little passages, all slightly different.

Empirically, my efficiency for managing patches (and my patience for it) drops dramatically once I get past a screenful of patches in my mail reader. By the time it grows to 200 or so patches and their replies, I'm often not even aware of the subjects of many messages, so messages can sit unread for weeks as I try to keep up with new arrivals. This is bad for me, you, and everyone else.

How we used to do flow control

Once upon a time, we did flow control with a "lossy" UDP-style: all our reviewers read whatever patches they had time and interest for and deleted the rest. If your patch didn't get responded to, you were encouraged to resend it after a while until someone paid attention. Believe it or not, this didn't work so well and a lot of contributions fell through the cracks.

But for the last couple years, I've been making an explicit effort to hold on to every patch until someone has responded to it. This means the old retry-until-it-sticks approach is discouraged and senders need to use a slightly more advanced approach to flow control. This new, more "lossless" approach has greatly improved reliability, but it also means that there's a nasty queueing problem and it lives in my inbox.

As I end up doing the first review on something like 75-90% of patches myself, it's become something like 75-90% of my day job. So I've been making some rules and tools to try to keep the process running smoothly.

How flow control should work on your side

So you'd like to see patches smoothly flowing with maximum throughput and minimum latency? Here's how you can help. I publish a bunch of information about the state of my inbox that you can consult, including the live graph at the top of the page, and the list of inflight patches.

  1. before sending, check the graph or inflight
  2. if you already have a bunch of patches in-flight, go back to 1
  3. pick a set of closely-related patches to send, preferably 5 or fewer
  4. if there's a large backlog, consider sending even fewer
  5. if the backlog is over 100, consider sending just one or even zero
  6. repeat

Having multiple different series in flight exacerbates the "are these patches related?" problem, so you should try to have only one topic of patches in flight at a time.

/!\ Also, try to avoid being part of the "thundering herd" that arrives just before the code freeze starts or just after it ends.

How flow control works on my side

There are three basic methods on the receiving side to throttle senders:

They're all sub-optimal and it's much better if I don't have to throttle at all. If I have to, I prefer to use the first method, but will use the other two if forced. Normally, I try to do roughly this:

  1. take the patches marked for stable first (highest priority)
  2. take easy patches next (shrinks inbox faster)
  3. take hard patches next
  4. look at the throughput-abusing patches last

This slows down the people who are sending too many patches and keeps the people who are respecting the rules from getting unfairly attention-starved.

But even this is hard to do when I have a ton of patches to sort through. If you've only got one patch in flight and it's not getting responded to, I'm sorry. I recommend looking at who's filling up inflight and throwing bricks at them.

Frequently asked questions

Why don't you use a web-based patch management system?

For starters, it doesn't fix any of the problems addressed here. The rest of the reasons belong on another page.

Why don't you do pull requests?

I do, but only for contributors who've consistently demonstrated that they pass review on the first go. Very few contributors actually choose to go this way. And see above.

You've reviewed my latest patches but not my older ones, what gives?

I don't handle patches in received order, see above. If you want me to ever get to your old patches, you might want to stop sending new ones.

Why don't you do first-in first-out handling?

Some patches are intrisically harder to review and thus need time set aside to review them, so FIFO order would block anything happening until that time was available. FIFO also rewards the people who abuse my backlog by removing my primary means of throttling them.

What's the state of my patch? When can I expect a reply to my patch?

Please consult inflight and don't ask me. If your patch is in there, I'll get to it eventually. Asking me may make it happen later.

Can you handle my patch right now?

Please don't ask me to prioritize patches. Everyone's patches are important and distracting me is not productive.

Can you push/pull/sync with some repo right now?

I occasionally forget to push or pull, but more often than not, synchronization is blocked on work in progress (rebasing, fixing test failures, sysadmin work, actually writing code) or other real-life distractions.

What if I review patches when you're behind?

That's great! Just don't offset it by sending more patches. I still need to empty my queue.

What about when you're on vacation/traveling?

You should probably slow down, even if your patches are being accepted by other reviewers. I'll still need to catch up when I return, and if I come back to 500 messages in my inbox, catching up may take weeks or months. That's potentially worse for overall throughput than if patches stopped entirely while I was gone.

What if other people had direct commit access?

Putting aside whether there's anyone right for the job who's got enough free time to actually make a difference, there's still a queue management problem. To have a lossless queue, there must be one canonical queue, and senders must implement flow control.

mpm/flow (last edited 2017-12-27 05:39:20 by AugieFackler)