Half-baked idea: Content-addressable web proxy

There are several situations where you want to fetch some content and don’t particularly care which precise source it comes from:

  1. Downloading packages from Linux distro mirrors.
  2. Downloading git commits.
  3. Grabbing a bittorrent data block.

My proposal (which surely has been done??) is that clients can supply the hash of the file they want when connecting to a proxy; something like:

GET http://example.com/foo HTTP/1.1
Content-Hash: sha256 b32683017c9530[etc]

The proxy is entitled to return any object in its cache that has the desired hash. If it doesn’t have any such object then it’ll fetch it from the URI in the usual way. We’ll have to make some assumptions that only cryptographically strong hashes are allowed, both to prevent the client getting wrong data and to stop clients fishing for unauthorized files from the cache.

In the distro mirroring case, the metadata would contain the hashes of the packages (which it probably already does). The client would supply these to the proxy. The proxy would be able to satisfy the request no matter what mirror was selected — you wouldn’t get the situation where the proxy is downloading several copies of the same data from different mirrors.

In the git case, git commits are the hashes. This would finally let us have an intelligent git mirror, something I’ve been wanting for a while given that I’m on slow DSL and downloading gnulib multiple times per day is no fun for anyone.



