Tag Archives: ideas

Half-baked ideas: C strings with implicit length field

For more half-baked ideas, see the ideas tag.

If you prefer just to see the code, then it’s here.

Chris Siebenmann wrote a couple of interesting articles about C’s null terminated strings and how they pre-date C.

Chris notes an alternative is a length + string representation, as used in Pascal. Although there are libraries for this in C, there are several drawbacks and approximately no one uses them.

However it’s possible to have the best of both worlds: Strings using an implicit length field that takes up no extra storage. These strings are backwards compatible with ordinary C strings — you can literally pass them to legacy functions or cast them to char * — yet the equivalent of a strlen operation is O(1).

There are two ideas here: Firstly, when you use the C malloc function, malloc stashes some extra metadata about your allocation, and with most malloc implementations there is a function to obtain the size of the allocation from a pointer. In glibc, the function is called malloc_usable_size. Note that because of alignment concerns, the amount allocated is usually larger than the amount you originally requested.

The second idea comes from OCaml. OCaml stores strings in a clever internal representation which is both backwards compatible with C (a fancy way to say they are null terminated), and it allows you to get the real length of the string even though OCaml — like C — allocates more than requested for alignment reasons.

So here’s how we do it: When allocating an “implicit length string” (ilenstr) we store extra data in the final byte of the “full” malloced space, in the byte marked B in the diagram below:

+-------------------------+----+------------+----+
| the string              | \0 |   ....     | B  |
+-------------------------+----+------------+----+
<----- malloc we requested ---->
<----------- malloc actually allocated ---------->

If malloc allocated exactly the same amount of space as is used by our string + terminating null, then B is simply the terminating \0:

+-------------------------+----+
| the string              | \0 |
+-------------------------+----+

If malloc allocated 1 spare byte, we store B = 1:

+-------------------------+----+----+
| the string              | \0 | 1  |
+-------------------------+----+----+

If malloc allocated 4 spare bytes, we store B = 4:

+-------------------------+----+----+----+----+----+
| the string              | \0 |   ....       | 4  |
+-------------------------+----+----+----+----+----+

Getting the true length of the string is simply a matter of asking malloc for the allocated length (ie. calling malloc_usable_size), finding the last byte (B) and subtracting it. So we can get the true string length in an O(1) operation (usually, although this may depend on your malloc implementation).

ilenstr strings can contain \0 characters within the string.

ilenstr strings are also backwards compatible, in that we can pass one to any “legacy” C function, and assuming the string itself doesn’t contain any \0 inside it, everything just works.

Alright. This is terrible. DO NOT USE IT IN PRODUCTION CODE! It breaks all kinds of standards, is unportable etc. There are security issues with allowing \0-containing strings to be passed to legacy functions. Still, it’s a nice idea. With proper cooperation from libc, standards authorities and so on, it could be made to work.

Here is my git repo:

http://git.annexia.org/?p=ilenstr.git;a=summary

7 Comments

Filed under Uncategorized

Half-baked ideas: qemu -M container

For more half-baked ideas, see the ideas tag.

Containers offer a way to do limited virtualization with fewer resources. But a lot of people have belatedly realized that containers aren’t secure, and so there’s a trend for putting containers into real virtual machines.

Unfortunately qemu is not very well suited to just running a single instance of the Linux kernel, as we in the libguestfs community have long known. There are at least a couple of problems:

  1. You have to allocate a fixed amount of RAM to the VM. This is basically a guess. Do you guess too large and have memory wasted in guest kernel structures, or do you guess too small and have the VM fail at random?
  2. There’s a large amount of overhead — firmware, legacy device emulation and other nonsense — which is essentially irrelevant to the special case of running a Linux appliance in a VM.

Here’s the half-baked idea: Let’s make a qemu “container mode/machine” which is better for this one task.

Unlike other proposals in this area, I’m not suggesting that we throw away or rewrite qemu. That’s stupid, as qemu gives us lots of useful abilities.

Instead the right way to do this is to implement a special virtio-ram device where the guest kernel can start off with a very tiny amount of RAM and request more memory on demand. And an empty machine type which is just for running appliances (qemu on ARM already has this: mach-virt).

Libguestfs people and container people, all happy. What’s not to like?

1 Comment

Filed under Uncategorized

Half-baked ideas: Demand-revealing referenda applied to Fedora features

For more half-baked ideas, see the ideas tag

Recently Lennart proposed dropping — and then unilaterally dropped a few days later — support for TCP wrappers in systemd. I haven’t used TCP wrappers for a long time, but there are some who do, and for those people dropping features like this provokes strong feelings. How should we conduct a survey or vote to decide what features to add or drop in software projects?

One way would be for all users [however that is defined] to have a vote. The problem with that is that a feature which few people use, but which really matters for those that use it would probably get dropped by a simple majority vote.

A better idea would be to use an economic system called a revealed preference. The idea is by asking people to risk their own money on the outcome of a vote, you hope to get a truer picture of their feelings. This technique also excludes moaners with lots of time on their hands to argue on mailing lists.

Chris Dillow (who incidentally writes a really great blog) has a worked example of a demand-revealing referendum which you should go and read.

Let’s try this with the systemd / TCP wrappers example. I’m going to have six voters. Four are mostly apathetic about the feature. But two of them use it, and one of those is going to have to change his whole infrastructure around if TCP wrappers goes away.

But first I have to assign a cost to this feature1. Unlike Chris’s Trident example, removing TCP wrappers from systemd is cheap. But it’s not completely free, assuming that Lennart is going to have to write some code, communicate the change, update documentation and so on. I’ll say it costs £12, which is £2/voter.

Let’s see how our six people might vote:

        Cost  Benefit
Alice   £ 2   £    5
Bob     £ 2   £   10
Charlie £ 2   £    2
Diane   £ 2   £    2
Eleanor £ 2   £  -50
Fred    £ 2   £-1000
TOTAL   £12   £-1031

Alice and Bob perceive some small benefit to the change because they think it’ll make systemd cleaner. Fred is the one who is going to have to make significant changes to his company network, and he’s not happy. Charlie and Diane are completely neutral.

The net benefits are calculated by subtracting the benefit from the cost:

        Cost  Benefit  Net benefit
                     (Benefit - Cost)
Alice   £ 2   £    5   £    3
Bob     £ 2   £   10   £    8
Charlie £ 2   £    2   £    0
Diane   £ 2   £    2   £    0
Eleanor £ 2   £  -50   £  -52
Fred    £ 2   £-1000   £-1002
TOTAL   £12   £-1031   £-1043

One thing you should notice from the “TOTAL” row is that there is no (expressed) net benefit to the change. Fred’s large negative vote has soured the whole thing. It sounds unfair that Fred is able to block this, but read on …

All we’ve done so far is asked people to guess numbers. To make it a revealed preference, we have to get people to pay real money. In this case, we’re going to ask some people to pay what is called a Clarke tax.

The tax is paid only by those who “win” (or get their way). Eleanor and Fred in this example get their way and we keep TCP wrappers in systemd. They pay the social cost of their winning that is incurred by the rest of the voters. To calculate the tax you have to remove Eleanor and Fred from the table to find the net benefit without them:

        Net benefit
Alice   £    3
Bob     £    8
Charlie £    0
Diane   £    0
TAX     £   11

Eleanor and Fred have to pay £11 in tax. (I’m unclear if this is split equally or pro-rata according to their vote). They pay this real money to Alice and Bob. Even after paying the tax, Eleanor and Fred and still better off (according to their claim). Alice and Bob have been compensated for their lost benefit.

The bids in the auction are sealed — ie. people shouldn’t be able to collude. Let’s imagine however that Alice estimated Fred’s £1000 cost and tried to neutralize it by claiming a £2000 benefit to the change. Alice would win (total net benefit becomes positive £952), but she and the other winners would have to pay a tax of £1046. This is a costly victory, but the money goes some way to compensating Fred for the changes he has to make to his company network.

1One issue with this is the estimate of the cost of the feature. I’m sure systemd developers will claim that although dropping TCP wrappers costs a bit of money in the short term, it pays dividends in the long term because of reduced code maintenance and bug reports. In other words that the cost is negative. You have to be able to provide credible costs for this to work, but I think you can express that by having the feature developers join in the voting process, in other words, revealing their true preferences as well.

Further reading

Leave a comment

Filed under Uncategorized

Half-baked idea: Kill SIGOK

For more half-baked ideas, see the ideas tag

I’m sure you’ve sat there waiting for some long compile to finish. It’s just started the tests, but you don’t want it to do the tests, just finish compiling dammit!

My half-baked idea is this: sending a special signal to a process (SIGOK) should kill the process, but the process should return 0 (ie. non-error exit).

3 Comments

Filed under Uncategorized

Half-baked ideas: Kernel modules-in-a-file

For more half-baked ideas, see the ideas tag.

The Linux kernel:

/boot/vmlinuz-3.11.9-200.fc19.x86_64

5 MB of highly optimized code that runs the world’s computers. You can copy it around and boot it on different hardware.

No wait, that’s all wrong. I’m forgetting about the modules. There are another 2832 files in /lib/modules/3.11.9-200.fc19.x86_64 and without them this kernel will hardly function.

It’s nice to be able to copy the kernel around — to compile it on fast x86-64 and run it on slow ARM, or to run the host kernel in a VM, but all those module files make everyone’s lives a lot more complex.

So my half-baked idea is this:

Combine all the modules into a single file, let’s call it /boot/modules-3.11.9-200.fc19.x86_64. This file might start off as an ext4 filesystem which can be loopback-mounted on /lib/modules. Eventually I’d hope that the module tools could load modules from the file directly.

Now you can happily cross-compile and copy just two kernel files from x86 to ARM or into your VMs, and that’s a lot less hassle.

10 Comments

Filed under Uncategorized

Half-baked ideas: Log level viewer

For more half-baked ideas, see the ideas tag

Back when I used to work on this sort of thing we used to do a lot of testing involving loops. For 1000 passes, for 1000 voltage settings, for 1000 pulse widths, do the test. That sort of thing. Because logging all 1000x1000x1000 tests would produce far too much output, I wrote a little hierarchical logging library that saved the logs at each level:

for (pass = 0; pass < 1000; pass++) {
  log (1, "pass %d", pass);
  for (voltage = 0; voltage < 1000; voltage++) {
    log (2, "voltage %d", voltage);
    for (pulse = 0; pulse < 1000; pulse++) {
      log (3, "pulse %d", pulse);
      log (3, "performing test");
      log (3, "setting pulse");
      if (set_pulse () == -1)
         error ("setting pulse failed");
      /* etc */
    }
  }
}

The log function didn’t immediately print the message. Instead it saved it in a buffer. Only when a test failed would the log buffer be printed so you’d see something like:

pass 997
voltage 123
pulse 0
performing test
setting pulse
error: setting pulse failed

Well I don’t recall now the details, but it strikes me there are many situations where logging is hierarchical, and I’m not aware of any logging libraries that express this easily.

One example would be make. You have top level rules like all and check. At a high level, a user might only be interested in whether make all check is currently running the all rule or the check rule. At the next level down, there is each directory recursed into. Below that, individual Makefile rules. Below that, individual commands.

How nice would it be to have a log viewing program that could display this? On screen you’d see several subwindows, each individually scrolling:

▃▃ top level ▃▃▃▃▃▃▃▃▃▃▃▃▃▃
all: OK
check: running ...

▃▃ level 1 ▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃
make all in src: OK
make all in tools: OK
make check in src: running ...

▃▃ level 2 ▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃
  CC       config.o
  CC       domain.o
  CC       options.o
  TEST     API: running ...

That seems to me to be a much clearer way of presenting things than the usual make “splurge”.

2 Comments

Filed under Uncategorized

Half-baked ideas: wikipedia for facts

Want more half-baked ideas? See my ideas tag

Would you like to find out about Boston USA? There’s Wikipedia for that. How about travelling there? Wikivoyage Boston.

How about the population of Boston in the years 1625-2013? Or the wages of bartenders in that fine city over the years? Or the peak summer temperature each year? Not so good.

My half-baked idea is a “wikipedia for curated facts”. These can be derived from many sources, but are presented in a uniform way (by place, time, variable, etc), with references to back them up.

This would be a great way to inject factual content into the air-headed anecdote-based nonsense that passes for opinion on the internet.

13 Comments

Filed under Uncategorized