Tag Archives: email

Patch review and message brokers

One thing I’ve wanted to do for a long time is get better at patch review. It’s pretty important for successful open source projects to provide feedback to developers quickly, and as anyone who follows me on the libguestfs mailing list will know, I’m terrible at it.

One thing I could do to make it a bit better is to automate the boring bits: Does the patch series apply? Does it compile? Does it pass the test suite? If one of those things isn’t true then we tell the submitter to fix it.

Some projects — the Linux Kernel Mailing List (LKML) for instance — provide basic feedback automatically. For LKML this is provided by Intel’s 0-day test service. If you post a patch on LKML then sooner or later you’ll receive an automated reply like this one.

Today I thought I’d write something like this, partly to reinvent the wheel, but mostly to learn more about the RabbitMQ message broker.

You see, if you have to receive emails, run large tests, and send more emails, then at least two and possibly more machines and going to be involved, and as soon as you are using two or more machines, you are writing a distributed system and you need to use the right tools. Message brokers and RabbitMQ in particular make writing distributed systems easy — trust me, I’ll show you how!

Receiving emails

Our first task is going to be how to get the emails into the system. We can use a procmail rule to copy emails to a command of our choice, but the command only sees one email at a time. Patch series are spread over many individual emails, you don’t always get them at once, and you certainly aren’t guaranteed to get them in order.

So first of all I set up a RabbitMQ queue which just takes in emails in any order and queues them:

The input to this queue is a simple script which can inject single emails or (for testing) mboxes.

Threading emails

Reconstructing the email thread, filtering out non-patch emails, and putting the patches into the right order, is done by a second task which runs periodically from a cron job.

The threading task examines the input queue and tries to reconstruct whole patch series from it. If it gets a patch series, it removes those messages from the input queue and places the whole patch series as a single message on a second queue:

What makes this possible is that RabbitMQ allows you to get messages from a queue, and then acknowledge (or not acknowledge) them later. So the threader gets all the available messages, tries to assemble them into threads. If it finds a complete patch series, then it acknowledges all of those emails — which deletes them from the input queue. For incomplete patch series, it doesn’t bother to acknowledge them, so they stay on the queue for next time.

By the magic of message brokers, the threader doesn’t even need to run on the same machine. Conceivably you could even run it on multiple machines if there was a very high load situation, and it would still work reliably.

Performing the tests

Once we have our fully assembled patch series threads, these are distributed to multiple queues by a RabbitMQ fanout exchange. There is one queue for each type of test we need to run:

An instance of a third task called perform-tests.py picks up the patches and tests them using custom scripts that you have to write.

The tests usually run inside a virtual machine to keep them out of harm’s way. Again the use of a message broker makes this trivial, and you can even distribute the tests over many machines if you want with no extra programming.

Reporting

There is a final queue: When tests finish, we don’t necessarily want to email out the report from the test machine. There would be several problems with that: it would reveal details of your testing infrastructure in email headers; SMTP servers aren’t necessarily available all the time; and you don’t always want your test machines to have access to the public internet.

Instead the result is placed on the patchq_reports queue, and a final task called send-reports.py picks these reports up and sends them out periodically. The report emails have the proper headers so they are threaded into the original mailing list postings.

Conclusion

It’s a simple but powerful multi-machine test framework, all in under 600 lines of code.

3 Comments

Filed under Uncategorized

A timely reminder: Don’t email questions directly

Want help? Don’t email me directly

Leave a comment

Filed under Uncategorized

Want help? Don’t email me directly

Occasionally people email me directly about some software problem. Whether by coincidence or not, this happened quite a few times last week.

If you want help, email the public mailing lists. I won’t answer you if you email me directly.

Why is this? It’s not because I don’t want to help you. If I can, I will answer your question on the public list. It’s not a plot to get you to buy Red Hat support. Although if you do pay for support then you will get my individual attention.

There are two reasons:

  1. On the mailing list, your question and the answer are on the public record. Others looking to solve the same problem can search and find the answer. If it’s not in the public record like that, then I’m going to have to answer everyone who emails me individually, and how is that going to scale?
  2. By posting on the mailing list, someone else may be able to answer you, so it’s better for you too since there are more people who can answer your question.

3 Comments

Filed under Uncategorized