Tag Archives: hive

On the awesomeness of ocaml-bitstring

I used bitstring to reverse engineer the Windows registry “hive” format. I know that bitstring is my own program, but coming back to it two years after I wrote it and using it again for this, I really think this is a brilliant tool. (Bitstring wasn’t my idea — it was inspired by the bitstring manipulation feature in Erlang).

C is supposed to be a good natural programming language for dealing with bits and bytes, right? The ocaml-bitstring program, which analyzes hive files in far more detail than the C program, is half the size and just as fast.

As an example, here’s how we load the hive file and analyze the first part of the header:

let bits = bitstring_of_file filename

(* Split into header + data at the 4KB boundary. *)
let header, data =
  takebits (4096 * 8 ) bits, dropbits (4096 * 8 ) bits

let () =
  bitmatch header with
  { "regf" : 4*8 : string;
    seq1 : 4*8 : littleendian;
    seq2 : 4*8 : littleendian, check (seq1 = seq2);
    last_modified : 64
      : littleendian, bind (nt_to_time_t last_modified);
    1_l (* major *) : 4*8 : littleendian;
    minor : 4*8 : littleendian } ->
      (* ... *)

The bitmatch statement elegantly matches the file. It rejects the file if the first four bytes aren’t “regf” (the file magic number) or if the major version number is not 1. It then unpacks the following fields, converting from the file’s littleendian ordering to host ordering, converting the NT timestamp into a time_t and so on.

Although not shown there, bitstring will also work just fine on arbitrary bit boundaries, albeit more slowly because the generated code is able to make fewer optimizations.

Even though the Windows hive file format is moronic, I successfully used bitstring to reverse engineer it in about 3 days, with some help from the contradictory and often incorrect public documentation out there.


Filed under Uncategorized

libhivex: Windows Registry hive extractor library

Several people managing Windows virtual machines have told me that libguestfs/virt-cat isn’t enough for them. They’d like to be able to get at Windows Registry entries in the guest.

A typical example is the imaginary [as of now] virt-win-reg command that lets you interrogate the Registry in a guest:

$ virt-win-reg MyWinGuest '\HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows NT\CurrentVersion'
"ProductName"="Microsoft Windows Server 2003"
"RegisteredOwner"="Richard Jones"

Right now you can only do this indirectly, by laboriously downloading the registry hives and decoding them with tools such as reged. I outlined how to do that here but there’s no doubt that it should be made a lot easier.

The first step is a more reliable way to query registry files themselves. The files come from foreign, buggy, possibly malicious guests, and so code that touches them must be written carefully and conservatively to avoid security problems.

Another problem is that the tools in this area tend to convert the binary, proprietary “hive” format into a regedit-compatible text format. The problem is that regedit itself is no easier to parse. What we would like is a more compatible format — XML or a library.

There are several existing tools to do this. The best is certainly Petter Nordahl’s chntpw utility which we’ve been carrying in Fedora for a while. Unfortunately Petter hasn’t been answering our queries about issues in the code and we are concerned that the code isn’t cautious enough to deal with an onslaught of malicious registry files. Another is the BSD-licensed dumphive program written in Pascal.

To address our concerns I have spent the last three days writing a simpler version of Petter’s program called libhivex. This library and associated programs are able to extract the contents of Windows Registry “hive” files, and make this available through a simple C API or as XML. The library is written very defensively and should deal with malicious files. The scope of the library is also being kept intentionally small: we won’t use it to modify these files ever, just to extract data from them.

I hope to publish a patch series for this soon for libguestfs, followed by some useful command line tools to let sysadmins get data from their Windows virtual machines.

Got a suggestion for a useful libguestfs-related tool? Let me know in the comments.


Filed under Uncategorized