Tag Archives: HTTP

Using libguestfs to open an ISO on a public website

The new curl support added to libguestfs 1.22 lets you open any ISO remotely from a public web site or FTP server:

$ export LIBGUESTFS_BACKEND=direct
$ guestfish --ro -i --format=raw \
    -a http://releases.ubuntu.com/precise/ubuntu-12.04.2-desktop-amd64.iso

Operating system: Ubuntu 12.04.2 LTS "Precise Pangolin" - Release amd64 (20130213)
/dev/sda1 mounted on /

><fs> ll /
total 2506
dr-xr-xr-x  1 root root    2048 Feb 13 22:21 .
drwxr-xr-x 23 1000 1000    4096 May 28 13:55 ..
dr-xr-xr-x  1 root root    2048 Feb 13 22:21 .disk
dr-xr-xr-x  1 root root    2048 Feb 13 22:21 EFI
-r--r--r--  1 root root     236 Feb 13 22:21 README.diskdefines
-r--r--r--  1 root root     134 Feb 13 22:20 autorun.inf
dr-xr-xr-x  1 root root    2048 Feb 13 22:21 boot
dr-xr-xr-x  1 root root    2048 Feb 13 22:21 casper
dr-xr-xr-x  1 root root    2048 Feb 13 22:21 dists
dr-xr-xr-x  1 root root    2048 Feb 13 22:21 install
dr-xr-xr-x  1 root root   18432 Feb 13 22:21 isolinux
-r--r--r--  1 root root   16443 Feb 13 22:21 md5sum.txt
dr-xr-xr-x  1 root root    2048 Feb 13 22:21 pics
dr-xr-xr-x  1 root root    2048 Feb 13 22:21 pool
dr-xr-xr-x  1 root root    2048 Feb 13 22:21 preseed
lr-xr-xr-x  1 root root       1 Feb 13 22:21 ubuntu -> .
-r--r--r--  1 root root 2504624 Feb  8 22:58 wubi.exe

Of course it is slow as hell and not nice on the web host. It makes lots of byte-range requests on the host, downloading a few KB with each request, which is kind of the worst case for webservers to handle.

Note also that Fedora’s curl is broken. I compiled my own from upstream git.

4 Comments

Filed under Uncategorized

Using libguestfs over HTTP (and FTP)

New in libguestfs upstream and 1.21.39 is the ability to access disks over FTP, FTPS, HTTP, HTTPS and TFTP (read-only).

You can use it like this:

$ export LIBGUESTFS_BACKEND=direct
$ guestfish --ro -a http://x.x.x.x/scratch/winxp.img -i

Welcome to guestfish, the guest filesystem shell for
editing virtual machine filesystems and disk images.

Type: 'help' for help on commands
      'man' to read the manual
      'quit' to quit the shell

Operating system: Microsoft Windows XP
/dev/sda1 mounted on /

><fs> ll /
total 1573209
drwxrwxrwx  1 root root       4096 Apr 16  2012 .
drwxr-xr-x 23 1000 1000       4096 May 11 18:45 ..
-rwxrwxrwx  1 root root          0 Oct 11  2011 AUTOEXEC.BAT
-rwxrwxrwx  1 root root          0 Oct 11  2011 CONFIG.SYS
drwxrwxrwx  1 root root       4096 Oct 11  2011 Documents and Settings
-rwxrwxrwx  1 root root          0 Oct 11  2011 IO.SYS
-rwxrwxrwx  1 root root          0 Oct 11  2011 MSDOS.SYS
-rwxrwxrwx  1 root root      47564 Apr 14  2008 NTDETECT.COM
drwxrwxrwx  1 root root       4096 Oct 11  2011 Program Files
drwxrwxrwx  1 root root       4096 Oct 11  2011 System Volume Information
drwxrwxrwx  1 root root      28672 Oct 11  2011 WINDOWS
-rwxrwxrwx  1 root root        211 Oct 11  2011 boot.ini
-rwxrwxrwx  1 root root     250048 Apr 14  2008 ntldr
-rwxrwxrwx  1 root root 1610612736 Oct 11  2011 pagefile.sys

Apart from being a tiny bit slower, it just works as if the disk was local.

3 Comments

Filed under Uncategorized

Half-baked ideas: reputation system for IP addresses

For other half-baked ideas, see my ideas tag.

I’m an obstinate log watcher. Watching web server logfiles in particular gives me a fascinating insight into how the bottom-feeders on the internet work, comment spammers, email harvesters, crap search engines and the like.

As a pretty random example, a single spammer (or more likely “illegal spam botnet”) just tried to fill in the comment form on one particular website I run 26 times in roughly 90 minutes. If you still have any myths about how sophisticated spammers are, read on.

Myth: spammers promote a particular website. Reality: spammers are still able to register huge numbers of random domains, and use very complex multi-step redirection.

Myth: spammers must operate from a limited set of IP addresses. Reality: spammers have access to virtually unlimited numbers of IP addresses.

Myth: each attack comes from a single IP address. Reality: attacks jump from IP addresses separated around the world, and those attacks are coordinated and look just like a single multi-step transaction, complete with correct cookies which must be passed between the hosts using a higher “back end” layer.

Myth: spambots don’t run Javascript, download images or solve captchas. Reality: …

The jury is still out on the last one. Certainly it’s not common, but a significant subset of comment spam does appear to come from real browsers, which run Javascript, download images and solve captchas. However I believe much or all of this must come from real people operating from sweatshops in countries with very low wages. That’s hard to tell just from looking at logfiles.

Each of the 26 completed transactions I saw involved multiple HTTP requests, and every single HTTP request came from a different IP address. But each completed transaction had a consistent cookie. In some cases the IP addresses were separated by half the earth, but HTTP requests followed each other in sub-second, indicating a sophisticated second level operation coordinating it all. Each request contained URLs for 4 websites, generated using random characters, and only some of these sites resolve.

So on to the half-baked idea.

Why don’t we have a proper, distributed reputation system for IP addresses?

A spammer can’t source an HTTP request from just any IP address, so they need to take over some grandma’s Windows PC, or someone’s web server, or persuade people to route some bogus AS. Every time an honest website owner (like me!) sees a bad IP, they register it.

Of course, spammers themselves will try to game the system, but they will do so from their own random IP addresses. We need to make sure that their “votes” count for less, and a reputation system should be able to decide this (eg. bad IP votes for bad IP? those votes count negatively).

If grandma tries to post a good comment, her IP may well cause that comment to be rejected. Good thing! She needs to clean up her (Windows) PC.

And what about ISPs who rotate IP addresses between good and bad customers? Those ISPs need to police their users and make sure they clean up their Windows PCs, or force the users on to better operating systems that don’t allow these exploits.

Note There are people classifying IPs now, eg. project honeypot and stop forum spam, but these guys don’t implement a reputation system and in some cases have nasty licensing terms which make the data that we provide for free into proprietary databases. No thanks.

5 Comments

Filed under Uncategorized