I have no objection if you want to get up an hour earlier. I don’t even object if you think it makes the daylight longer / gives you more sunlight / or whatever silly reasoning you have. Just don’t make me do it too, ‘m kay?
Tag Archives: rants
But … how I’ve come to dislike Haskell in the process.
I should say this is not just because I’m a big fan of OCaml and other ML derivative languages, ie. fast and useful functional programming that’s very practical. There are some real problems with Haskell which make it less than useful as a real programming language.
Significant whitespace. It’s not just very difficult to understand how the whitespace works (far more so than Python, where it’s merely annoying), but it also makes it almost near impossible to automatically generate Haskell code, which is what we do in libguestfs.
The IO monad. Most Haskell examples use the IO monad, which serializes everything, making the code the same as more ordinary languages. The disadvantage is that monads are obscure and hard to understand. The advantage is .. unclear: your code is all still serialized, mostly, as well as being slow because of the overhead, so it’s not clear what the point is.
Unexpressive FFI. After dealing with a lot of FFIs I think I’m qualified to talk about this one. Haskell’s is terrible: The documentation is obscure verging on bad. The examples are rare (for anything that’s more complex than calling “sin”). There’s a great deal of brokenness in major features, eg. passing or returning structures. A lot of stuff is simply not possible without delving into the depths of compilers. It would have been much better to define a C API and write FFIs in C.
Laziness .. should not be the default. It’s not how any real computers work, or have ever worked, or are likely to work in the future.
Lack of optional/labelled args. Everyone else has them. Haskell has a huge hack. (If you try to implement this huge hack in reality you’ll see it’s not practical if you have a large number of functions that want optional args).
Also I get the impression from reading online that Haskell is widely studied and often pimped, but not used very much in reality.
Not being a regular library causes no end of constant build problems.
Like this crap because we did “gettextize” without doing “autoconf” (or vice versa):
make: Entering directory `/builddir/build/BUILD/libguestfs-1.14.7/po' *** error: gettext infrastructure mismatch: using a Makefile.in.in from gettext version 0.17 but the autoconf macros are from gettext version 0.18
I just don’t get why gettext can’t be a regular, ordinary, plain library so we don’t have to constantly suffer from this sort of thing. There is surely no other library that needs to rewrite your entire build system.
After two days, I nearly have the first tab (out of four) working.
Granted, maybe the first tab is the hardest one:
The job of the first tab is to ask the user for the source (disk image or libvirt guest). It then fires off a background job to open that guest and inspect it for operating systems. Based on the outcome of that (opened OK, found OSes, no OSes found, etc) it has to update and enable the other tabs.
Also the user can preselect a guest on the command line. We also have to deal with connecting to libvirt asynchronously to read the list of guests (and this might also fail in a number of ways).
So far, 1600 lines of code, and the first tab is by no means complete.
One part of the problem is there’s a certain “impedance mismatch” between functional programming in OCaml and writing Gtk. Gtk is heavily mutable and object based. OCaml prefers (but does not require) immutability, and objects in OCaml are obscure and not widely used, and the Gtk bindings are written in a very hard-core object OCaml style.
Another part is just that it’s tedious. It would be tedious if I was doing this in C or Python too. You’ve got asynchronous actions going off here and there which update bits of state. Every control or input could potentially affect every other control or output, resulting in a kind of O(n2) mess of wiring up signals and callbacks.
Is there an easier way? I don’t know …
My friend’s first comment was “how do I minimize all these windows”?
In the brave new world of GNOME 3, there is no minimize button. Why? Here is the thinking behind it. Wow, he asked a whole two people — on the GNOME team! That’s not even usability testing. It’s like he asked for a focus group and no one came …
Google are definitely “going Altavista” on us technical users. Their search results are becoming increasingly annoying and useless for technical queries. Some time back they changed searches so that broad match was the default. Now, even quoting search results to force exact match doesn’t seem to help. Google Instant could do with being taken outside and put out of its misery.
I remember Altavista going down the tubes.
Anyone wanna start a new search engine? I hear there’s money to be made …
NFSv4 lets you route all your NFS traffic through a single, well-known TCP port (2049). Horray, it really works, it really gets rid of one of the stupidities in previous versions of NFS. It lets you use firewalls again.
NFSv4 maps every user on the client to 4294967294. In order to do user name translation it requires Kerberos. (Forget about keeping fixed UIDs like in the good old days — that doesn’t seem to work at all). Great for setting up an Enterprise(TM) service with 1000 clients. Absolutely fucking useless to just share a filesystem between two machines.
There’s a serious point to this rant.
When you’re designing a system that you want people to use, make the easy cases easy.
Don’t make me read through a dozen obscure man pages and half a dozen really obscure configuration files to just have the basic function working.
NFSv4 has so far been a 100% failure for me and for many many other people because no one thought about making the easy case easy.
We came to the conclusion today that the world would be a little bit better if anyone who suggested using reference counting instead of a real garbage collector could receive a small non-lethal electric shock.
You can implement a real GC in only 600 lines of code or so. It’s not that hard people.
It’s quite popular to bash the Windows Registry in non-technical or lightly technical terms. I’ve just spent a couple of weeks reverse engineering the binary format completely for our hivex library and shell which now supports both reading and writing to the registry. So now I can tell you why the Registry sucks from a technical point of view too.
1. It’s a half-arsed implementation of a filesystem
It’s often said that the Registry is a “monolithic file”, compared to storing configuration in lots of discrete files like, say, Unix does under
/etc. This misses the point: the Registry is a filesystem. Sure it’s stored in a file, but so is ext3 if you choose to store it in a loopback mount. The Registry binary format has all the aspects of a filesystem: things corresponding to directories, inodes, extended attributes etc.
The major difference is that this Registry filesystem format is half-arsed. The format is badly constructed, fragile, endian-specific, underspecified and slow. The format changes from release to release of Windows. Parts are undocumented, seemingly to the Windows developers themselves (judging by the NT debug symbols that one paper has reproduced). Parts of the format waste space, while in other parts silly “optimizations” are made to save a handful of bytes (at the cost of making access much more complex).
2. Hello Microsoft programmers, a memory dump is not a file format
The format is essentially a dump of 32 bit C structures in a C memory heap. This was probably done originally for speed, but it opens the format to all sorts of issues:
- You can hide stuff away in unused blocks.
- You can create registries containing unreachable blocks or loops or pointers outside the heap, and cause Windows to fail or hang (see point 3).
- It’s endian and wordsize specific.
- It depends on the structure packing of the original compiler circa 1992.
3. The implementation of reading/writing the Registry in Windows NT is poor
You might expect, given how critical the Registry is to Window’s integrity, that the people who wrote the code that loads it would have spent a bit of time thinking about checking the file for consistency, but apparently this is not done.
- All versions of Windows tested will simply ignore blocks which are not aligned correctly.
- Ditto, will ignore directory entries which are not in alphabetical order (it just stops reading at the first place it finds a subdirectory named B > next entry A).
- Ditto, will ignore file entries which contain various sorts of invalid field.
The upshot of this is you can easily hide stuff in the Registry binary which is completely invisible to Windows, but will be apparent in other tools. From the point of view of other tools (like our hivex tool) we have to write exactly the same bits that Windows would write, to be sure that Windows will be able to read it. Any mistakes we make, even apparently innocuous ones, are silently punished.
Compare this to using an established filesystem format, where everyone knows the rules, and consistency (eg. fsck/chkdsk) matters.
Writing sucks too, because the programmers don’t correctly zero out fields, so you’ll find parts (particularly the Registry header) which contain random bits of memory, presumably kernel memory, dumped into the file. I didn’t find anything interesting there yet …
I also found Registries containing unreachable blocks (and not, I might add, ones which I’d tried modifying). I find it very strange that relatively newly created Windows 7 VMs which don’t have any sort of virus infection, have visible Registry corruption.
4. Types are not well specified
Each registry field superficially is typed, so REG_SZ is a string, and REG_EXPAND_SZ is, erm, also a string. Good, right? No, because what counts as a “string” is not well-defined. A string might be encoded in 7 bit ASCII, or UTF-16-LE. The only way to know is to know what versions of Windows will use the registry.
Strings are also stored in REG_BINARY fields (in various encodings), but also raw binary data is stored in these fields.
Count yourself lucky if you only access official Microsoft fields though because some applications don’t confine themselves to the published types at all, and just use the type field for whatever they feel like.
And what’s up with having REG_DWORD (little-endian of course) and REG_DWORD_BIG_ENDIAN, and REG_QWORD, but no REG_QWORD_BIG_ENDIAN?
5. Interchange formats are not well specified
A critical part of installing many drivers is making registry edits, and for this a text format (.REG) is used along with the REGEDIT program. The thing is though that the .REG format is not well-specified in terms of backslash escaping. You can find examples of .REG files that have both:
In addition the encoding of strings is again not specified. It seems to depend on the encoding of the actual .REG file, as far as anyone can tell. eg. If your .REG file itself is UTF-16-LE, then REGEDIT will encode all strings you define this way. Presumably if you transfer the .REG file to a system that changes the encoding, then you’ll get different results when you load the registry.
6. The Registry arrangement is a mess
Take a look at this forensic view of interesting Registry keys (PDF). List of mounted drives?
HKEY_LOCAL_MACHINE\SYSTEM\MountedDevices. But what the user sees is stored in
HKEY_CURRENT_USER\Software\Microsoft\Windows\CurrentVersion\Explorer\MountPoints2\CPC\Volume\. Unless you mean USB devices which might be in the above list, or in
HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Enum\USBSTOR. And the entries in those lists are by no means obvious — containing impenetrable binary fields and strange Windows paths.
If you browse through the Registry some time you’ll see it’s a giant accreted mess of non-standardized, overlapping information stored in random places. Some of it is configuration, much of it is runtime data. This is a far cry from
/etc/progname.conf in Linux.
7. The Registry is a filesystem
Back to point 1, the Registry is a half-assed, poor quality implementation of a filesystem. Importantly, it’s not a database. It should be a database! It could benefit from indices to allow quick lookups, but instead we have to manually and linearly traverse it.
This leads to really strange Registry keys like:
which are crying out to be implemented as indexed columns in a real database.
8. Security, ha ha, let’s pretend
Despite the fact that the Registry is just a plain file that you can modify using all sorts of external tools (eg. our hivex shell), you can create “unreadable” and “unwritable” keys. These are “secure” from the point of view of Windows, unless you just modify the Registry binary file directly.
Windows also uses an unhealthy dose of security-through-obscurity. It hides password salts in the obscure “ClassName” field of the Registry key. The “security” here relies entirely on the fact that the default Windows REGEDIT program cannot view or edit the ClassName of a key. Anyone with a binary editor can get around this restriction trivially.
9. The Registry is obsolete, sorta
Well the good news is the Registry is obsolete. The bad news is that Vista has introduced another, incompatible way to store application data, in
AppData/LocalLow directories, but that Windows Vista and Windows 7 continue to rely on the Registry for all sorts of critical data, and it doesn’t look like this mess is going to go away any time soon.
* * *
Thanks to all who commented. There is further discussion here on Reddit and here on Hacker News (including discussion of inaccuracies in what I wrote). If you want to look at our analysis code, it’s all here in our source repository. For further references on the Registry binary format, follow the links in the hivex README file.