Somedays I just spend the time going round in circles, and this was one of those days. Not really for lack of work, but lack of knowing the Right Thing to do. Below I’ll describe the problem I had today.

If you look at a virtual machine from the point of view of libvirt, it looks like CPU resources, network interfaces, and one or two massive, opaque blobs – the block devices (virtual hard drives). From the point of view of libguestfs we can squint a bit harder and resolve those opaque block devices as partitions and filesystems and files.

But that’s not the whole story either, because we can ask what is the virtual machine? People typically describe the virtual machine by what it is and can do:

“It’s our mail server, running on RHEL 5.2”

“That’s my Windows XP VM that I use to run Office”

libvirt lists out the VMs, but doesn’t see anything beyond their virtual hardware. libguestfs can analyze the virtual hard drives looking for filesystems. Using both libvirt and libguestfs I have written what can only be described as a very long, very hairy Perl script that tries to answer the meta-question of what the virtual machine is.

Operating system(s) Filesystems Applications installed Kernel and Device drivers
Linux, Windows, …
And the distribution and version of each
mount-point => device
And a great deal of information about each filesystem
Multiple applications Type of ethernet card, virtio drivers, Xen PV drivers, etc etc

The way the Perl script works is best described as horrific. It probes each filesystem it finds, tries to mount it, looks for “characteristic” files (like /etc/redhat-release, /grub/grub.conf and Program Files), parses /etc/fstab to try to work out the relationship between devices, labels, UUIDs and actual mount-points. Does it have a root filesystem? Does it have multiple root filesystems? Perhaps that means it’s multi-boot? It’s an inelegant, special-case, nightmare, and I’m only testing this on 14 sample guests. When it hits real world usage, it will undoubtedly grow whiskers and legs.

And this, oddly enough, is not the problem that perplexed me all day today. Rather the problem I have is how to package up and distribute this code.

It’s obviously going to be useful for others to reuse this code, since the answers to the meta-question are obviously useful in many situations. So it should be a library of some sort.

Do I keep it as a Perl library, perhaps exporting specialized Perl structures? That treats Perl preferentially, and why should Perl be treated this way, and not any of the other languages we support?

Do I rewrite it in C? That’s difficult for a couple of reasons. First and by no means least is that this code uses a lot of string handling and regular expressions, which C is hardly suitable for. Secondly the code exports a large, recursive data structure which is likely to change and be extended and refined over time, and a C API is generally not flexible enough to handle such a thing easily. Then to complete libguestfs, we’d have to write the code to convert that recursive structure into all the language bindings, which is time-consuming at best.

Should I have a stand-alone program exporting some intermediate representation? XML seems suitable. But strange though it may seem, XML is not the preferred choice of data representation in many languages. Perl and OCaml, for example, don’t really get along with XML being respectively too unstructured and too loosely typed.

So I’m left at an impasse. I have code, but it doesn’t “belong” anywhere. It’s generally useful, but not usefully general. What to do?


Filed under Uncategorized

5 responses to “Stuck

  1. syskill

    How about YAML as an intermediate representation?

  2. rich

    Yes, YAML is a possible intermediate repr, well supported on Perl, Python and Ruby. OCaml’s support is not so good, and who knows about Java.

    JSON and sexprs are others things that I’ll be looking at.

    Quite likely we’ll have to export in multiple reprs, like XML | Perl structures | YAML, based on some command line flag.


    Before you toss out Perl take another troll through the CPAN. There are several reasonable solutions for dealing with XML that (many) people overlook. XML::Twig and XML::Toolkit both for exceed the XML::Simple concept of parsing and can deal well with large XML documents.

    XML::Toolkit will do a sort of XML-> Perl Object (via Moose) mapping. Kind of like an ORM, but for XML. I am however very biased there as XML::Toolkit is one of my projects, but if you want a more mature solution XML::Compile and XML::Pastor both exist and I can recommend XML::Generator::PerlData to do serialization if that’s the only piece you need.

    Sorry but after nearly 10 years doing Perl and XML stuff I find it hard that people think Perl doesn’t get along well with XML.

  4. rich

    Chris, thanks for those tips.

    I’ll certainly take a look at your XML::Toolkit library. Previously I was trying to use XML::XPath::XMLParser which I found very frustrating to use.

  5. mpdehaan

    FWIW, I’ve found (at least in Python) JSON is much much much faster than the YAML implementations (in my usage, like 300x faster). The reason (I think) is there is a lot less room for variation in the rendering, so the parser has less work to do. There’s no unpredictability with what ends in newline, where something is inline versus indented, what is quoted and what is not, etc. Plus that makes it a little easier to read by humans too, and doesn’t require some standard of datastructure marshalling like XML does (like having to use XMLRPC marshalling… shudder).

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.