Tag Archives: git

Half-baked idea: Content-addressable web proxy

For more half-baked ideas, see my ideas tag

There are several situations where you want to fetch some content and don’t particularly care which precise source it comes from:

  1. Downloading packages from Linux distro mirrors.
  2. Downloading git commits.
  3. Grabbing a bittorrent data block.

My proposal (which surely has been done??) is that clients can supply the hash of the file they want when connecting to a proxy; something like:

GET http://example.com/foo HTTP/1.1
Content-Hash: sha256 b32683017c9530[etc]

The proxy is entitled to return any object in its cache that has the desired hash. If it doesn’t have any such object then it’ll fetch it from the URI in the usual way. We’ll have to make some assumptions that only cryptographically strong hashes are allowed, both to prevent the client getting wrong data and to stop clients fishing for unauthorized files from the cache.

In the distro mirroring case, the metadata would contain the hashes of the packages (which it probably already does). The client would supply these to the proxy. The proxy would be able to satisfy the request no matter what mirror was selected — you wouldn’t get the situation where the proxy is downloading several copies of the same data from different mirrors.

In the git case, git commits are the hashes. This would finally let us have an intelligent git mirror, something I’ve been wanting for a while given that I’m on slow DSL and downloading gnulib multiple times per day is no fun for anyone.


Filed under Uncategorized

xavierbot lives!

Or at least he now has his own git repository.

1 Comment

Filed under Uncategorized

git cherry-pick wins yet again

I’ve said it before, several times. It took me less than 30 minutes to cherry pick 41 bug fixes to the stable branch. There were only around 5 conflicts that had to be manually fixed.

$ git status
# On branch stable-1.12
# Your branch is ahead of 'origin/stable-1.12' by 41 commits.

Leave a comment

Filed under Uncategorized

Nice RPM / git patch management trick

As far as I know, this trick was invented by Peter Jones. Edit: Or it could be ajax?

Parted in Fedora uses a clever method to manage patches with git and “git am”.

%setup -q
# Create a git repo within the expanded tarball.
git init
git config user.email "..."
git config user.name "..."
git add .
git commit -a -q -m "%{version} baseline."
# Apply all the patches on top.
git am %{patches}

The background is that there is a git repo somewhere else which stores the unpacked baseline parted tarball, plus patches (stored as commits) on top.

I assume that Peter exports the commits using git format-patch. At build time these are applied on top of the tarball using git am.

There are two clear advantages:

  • No need to have lots of duplicate %patch lines in the spec file.
  • git-am restores permissions and empty files properly, which regular patch does not do.

With libguestfs in RHEL 6 we have roughly 80 patches, so managing these patches is very tedious, and this will greatly simplify things.


Filed under Uncategorized

Half-baked ideas: classify git commits and rule-based branch building

For more half-baked ideas, see my ideas tag.

This idea is an evolution of the previous idea for git commit dependency analysis.

Let’s say we have a development branch and a stable branch. Lots of bug fixes, features and so on go into the development branch, but for stable users we want to construct a branch containing only the stable, well-tested, obvious bug fixes. (It’s no coincidence that this sounds a lot like the scheme we use in libguestfs, but it’s also the basic scheme that Red Hat use to construct RHEL.)

What you end up doing is manually classifying the commits from the development branch, and cherry-picking the ones which look safe back to your stable branch. And you probably have a bunch of rules about what you want in your stable branch, such as “must fix a customer bug” or “must have had more than X months of testing in Fedora”.

A better idea would be to annotate the development commits with labels:

0d6fd9e fuse: Fix getxattr, listxattr calls and add a regression te
  labels: bugfix, fuse
  depends: 4e529e0

4e529e0 fish: fuse: Add -m dev:mnt:opts to allow mount options to b
  labels: feature, fish, fuse

feaddb0 roadmap: Move QMP to 'beyond 1.10'.
  labels: documentation

b8724e2 Open release notes for version 1.10.0.
  labels: documentation

e751293 ruby: Don't segfault if callbacks throw exceptions (RHBZ#664
  labels: bugfix, ruby
  depends: 6a64114

a0e3b21 RHEL 5: Use mke4fs on RHEL 5 as replacement for mke2fs.
  labels: bugfix, rhel5
  depends: 227bea6

(Before anyone jumps in here, Markus Armbruster pointed out to me that git has a great feature called git notes that makes the labelling part rather easy).

Now you need a rule as to what you’re going to allow into your stable branch, eg:

labels.contains ('bugfix') &&
  forall c in depends: c is in stable

Now your stable branch can be constructed entirely algorithmically. In fact, making many stable branches each with a slightly different emphasis (“bug fixes only”, “doc changes only”, “well-tested new features”, etc.) can be done automatically and algorithmically just by having more rules.

Update Here is a reply from Johan Herland who is one of the authors of git-notes. Thanks Johan for giving me permission to reproduce this email.

IMHO, using notes like you outline in the blog post makes a lot of sense. Attaching text strings as notes to the relevant commit is exactly what git-notes were created for. Storing simple strings for labels, and commit SHA1s (use full 40-char SHA1 sums to prevent future collisions) for dependencies sounds like the best plan.

A couple of different ways to encode this:

A. Place labels in one notes ref (refs/notes/labels), and dependencies in another notes ref (refs/notes/deps). The format of notes is simply one label/dependency per line, e.g.:

  $ git notes --ref=labels show HEAD
  $ git notes --ref=deps show HEAD

B. Place labels and deps in the same notes ref, and use a simple email-style header format, e.g.:

  $ git notes --ref=foo show HEAD
  Labels: bugfix, fuse, ruby
  Dependencies: 4e529e0, b8724e2

Either format is extensible: In (A), if you add a new data type, simply add a new notes ref; in (B) simply add a new header field name.

Whatever you feel most comfortable parsing/maintaining is probably the best choice for you.


Filed under Uncategorized

Half-baked ideas: git commit dependency analysis

For more half-baked ideas, see my ideas tag

Consider these three git commits A, B and C:

As far as git’s linear history is concerned, they are three independent commits and git only records the parent relationship A -> B -> C.

But could I cherry pick just patch C, omitting A and B? git won’t tell me, but I can examine the patches themselves (A, B, C) and answer this question (the answer is no, since the change in C depends on both changes A and B being applied first). The real dependency tree looks like this:

  ^ ^
 /   \
A --> B

Many other real dependency trees could have been possible. With another choice of A, B and C these might have been completely independent of each other, or (A, B) might have to be applied together, with C being independent.

The half-baked idea is whether we can write an automatic tool which can untangle these dependencies from the raw git commits? (Or whether such a tool exists already, I cannot find one)

There would be one important practical use for such a tool. When cherry picking commits for the stable branch, I would like to know which previous commits that the commit I’m trying to apply depends on. This gives me extra information: I can decide that applying this commit is too disruptive — perhaps it depends on an earlier feature which I don’t want to add. I can decide to go back and apply the older commits, or that a manual backport is the best way.

The information you can derive from patches doesn’t tell the whole story. There are two particular problems, one revealed by the choice of patches A, B and C above. With a trivial change to B, it is possible to apply A and B independently. There is only a dependency A -> B because a little bit of the context of patch A “leaked” into patch B. It is also possible for two features to logically depend on each other, but not overlap in any way: Consider the case where you add a log collector and log file processor in separate commits. The log file processor might be completely useless without the log collector, but the commits could appear completely independent if you just examine the patches.


Filed under Uncategorized

Note on creating and pushing a git branch

Note to self on how we created the 1.6.0/1.7.0/stable-1.6 branch in git:

git tag [-s] -m 'version 1.6.0' 1.6.0 $sha1
git tag [-s] -m 'version 1.7.0' 1.7.0 $sha1
git branch stable-1.6 $sha1
git push origin tag 1.6.0 1.7.0 stable-1.6

(Thanks Jim Meyering for working out the details)


Filed under Uncategorized