It’s quite popular to bash the Windows Registry in non-technical or lightly technical terms. I’ve just spent a couple of weeks reverse engineering the binary format completely for our hivex library and shell which now supports both reading and writing to the registry. So now I can tell you why the Registry sucks from a technical point of view too.
1. It’s a half-arsed implementation of a filesystem
It’s often said that the Registry is a “monolithic file”, compared to storing configuration in lots of discrete files like, say, Unix does under /etc
. This misses the point: the Registry is a filesystem. Sure it’s stored in a file, but so is ext3 if you choose to store it in a loopback mount. The Registry binary format has all the aspects of a filesystem: things corresponding to directories, inodes, extended attributes etc.
The major difference is that this Registry filesystem format is half-arsed. The format is badly constructed, fragile, endian-specific, underspecified and slow. The format changes from release to release of Windows. Parts are undocumented, seemingly to the Windows developers themselves (judging by the NT debug symbols that one paper has reproduced). Parts of the format waste space, while in other parts silly “optimizations” are made to save a handful of bytes (at the cost of making access much more complex).
2. Hello Microsoft programmers, a memory dump is not a file format
The format is essentially a dump of 32 bit C structures in a C memory heap. This was probably done originally for speed, but it opens the format to all sorts of issues:
- You can hide stuff away in unused blocks.
- You can create registries containing unreachable blocks or loops or pointers outside the heap, and cause Windows to fail or hang (see point 3).
- It’s endian and wordsize specific.
- It depends on the structure packing of the original compiler circa 1992.
3. The implementation of reading/writing the Registry in Windows NT is poor
You might expect, given how critical the Registry is to Window’s integrity, that the people who wrote the code that loads it would have spent a bit of time thinking about checking the file for consistency, but apparently this is not done.
- All versions of Windows tested will simply ignore blocks which are not aligned correctly.
- Ditto, will ignore directory entries which are not in alphabetical order (it just stops reading at the first place it finds a subdirectory named B > next entry A).
- Ditto, will ignore file entries which contain various sorts of invalid field.
The upshot of this is you can easily hide stuff in the Registry binary which is completely invisible to Windows, but will be apparent in other tools. From the point of view of other tools (like our hivex tool) we have to write exactly the same bits that Windows would write, to be sure that Windows will be able to read it. Any mistakes we make, even apparently innocuous ones, are silently punished.
Compare this to using an established filesystem format, where everyone knows the rules, and consistency (eg. fsck/chkdsk) matters.
Writing sucks too, because the programmers don’t correctly zero out fields, so you’ll find parts (particularly the Registry header) which contain random bits of memory, presumably kernel memory, dumped into the file. I didn’t find anything interesting there yet …
I also found Registries containing unreachable blocks (and not, I might add, ones which I’d tried modifying). I find it very strange that relatively newly created Windows 7 VMs which don’t have any sort of virus infection, have visible Registry corruption.
4. Types are not well specified
Each registry field superficially is typed, so REG_SZ is a string, and REG_EXPAND_SZ is, erm, also a string. Good, right? No, because what counts as a “string” is not well-defined. A string might be encoded in 7 bit ASCII, or UTF-16-LE. The only way to know is to know what versions of Windows will use the registry.
Strings are also stored in REG_BINARY fields (in various encodings), but also raw binary data is stored in these fields.
Count yourself lucky if you only access official Microsoft fields though because some applications don’t confine themselves to the published types at all, and just use the type field for whatever they feel like.
And what’s up with having REG_DWORD (little-endian of course) and REG_DWORD_BIG_ENDIAN, and REG_QWORD, but no REG_QWORD_BIG_ENDIAN?
5. Interchange formats are not well specified
A critical part of installing many drivers is making registry edits, and for this a text format (.REG) is used along with the REGEDIT program. The thing is though that the .REG format is not well-specified in terms of backslash escaping. You can find examples of .REG files that have both:
"Name"="\Value"
and
"Name"="\\Value"
In addition the encoding of strings is again not specified. It seems to depend on the encoding of the actual .REG file, as far as anyone can tell. eg. If your .REG file itself is UTF-16-LE, then REGEDIT will encode all strings you define this way. Presumably if you transfer the .REG file to a system that changes the encoding, then you’ll get different results when you load the registry.
6. The Registry arrangement is a mess
Take a look at this forensic view of interesting Registry keys (PDF). List of mounted drives? HKEY_LOCAL_MACHINE\SYSTEM\MountedDevices
. But what the user sees is stored in HKEY_CURRENT_USER\Software\Microsoft\Windows\CurrentVersion\Explorer\MountPoints2\CPC\Volume\
. Unless you mean USB devices which might be in the above list, or in HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Enum\USBSTOR
. And the entries in those lists are by no means obvious — containing impenetrable binary fields and strange Windows paths.
If you browse through the Registry some time you’ll see it’s a giant accreted mess of non-standardized, overlapping information stored in random places. Some of it is configuration, much of it is runtime data. This is a far cry from /etc/progname.conf
in Linux.
7. The Registry is a filesystem
Back to point 1, the Registry is a half-assed, poor quality implementation of a filesystem. Importantly, it’s not a database. It should be a database! It could benefit from indices to allow quick lookups, but instead we have to manually and linearly traverse it.
This leads to really strange Registry keys like:
\ControlSet001\Control\CriticalDeviceDatabase\pci#ven_1af4&dev_1001&subsys_00000000
which are crying out to be implemented as indexed columns in a real database.
8. Security, ha ha, let’s pretend
Despite the fact that the Registry is just a plain file that you can modify using all sorts of external tools (eg. our hivex shell), you can create “unreadable” and “unwritable” keys. These are “secure” from the point of view of Windows, unless you just modify the Registry binary file directly.
Windows also uses an unhealthy dose of security-through-obscurity. It hides password salts in the obscure “ClassName” field of the Registry key. The “security” here relies entirely on the fact that the default Windows REGEDIT program cannot view or edit the ClassName of a key. Anyone with a binary editor can get around this restriction trivially.
9. The Registry is obsolete, sorta
Well the good news is the Registry is obsolete. The bad news is that Vista has introduced another, incompatible way to store application data, in AppData/Local
and AppData/LocalLow
directories, but that Windows Vista and Windows 7 continue to rely on the Registry for all sorts of critical data, and it doesn’t look like this mess is going to go away any time soon.
* * *
libguestfs on Fedora now provides the tools you need to manage the Registry in Windows virtual machines. For more details, see hivexsh and virt-win-reg documentation.
Update
Thanks to all who commented. There is further discussion here on Reddit and here on Hacker News (including discussion of inaccuracies in what I wrote). If you want to look at our analysis code, it’s all here in our source repository. For further references on the Registry binary format, follow the links in the hivex README file.