Tag Archives: git

Half-baked idea: Content-addressable web proxy

For more half-baked ideas, see my ideas tag

There are several situations where you want to fetch some content and don’t particularly care which precise source it comes from:

  1. Downloading packages from Linux distro mirrors.
  2. Downloading git commits.
  3. Grabbing a bittorrent data block.

My proposal (which surely has been done??) is that clients can supply the hash of the file they want when connecting to a proxy; something like:

GET http://example.com/foo HTTP/1.1
Content-Hash: sha256 b32683017c9530[etc]

The proxy is entitled to return any object in its cache that has the desired hash. If it doesn’t have any such object then it’ll fetch it from the URI in the usual way. We’ll have to make some assumptions that only cryptographically strong hashes are allowed, both to prevent the client getting wrong data and to stop clients fishing for unauthorized files from the cache.

In the distro mirroring case, the metadata would contain the hashes of the packages (which it probably already does). The client would supply these to the proxy. The proxy would be able to satisfy the request no matter what mirror was selected — you wouldn’t get the situation where the proxy is downloading several copies of the same data from different mirrors.

In the git case, git commits are the hashes. This would finally let us have an intelligent git mirror, something I’ve been wanting for a while given that I’m on slow DSL and downloading gnulib multiple times per day is no fun for anyone.


Filed under Uncategorized

xavierbot lives!

Or at least he now has his own git repository.

1 Comment

Filed under Uncategorized

git cherry-pick wins yet again

I’ve said it before, several times. It took me less than 30 minutes to cherry pick 41 bug fixes to the stable branch. There were only around 5 conflicts that had to be manually fixed.

$ git status
# On branch stable-1.12
# Your branch is ahead of 'origin/stable-1.12' by 41 commits.

Leave a comment

Filed under Uncategorized

Nice RPM / git patch management trick

As far as I know, this trick was invented by Peter Jones. Edit: Or it could be ajax?

Parted in Fedora uses a clever method to manage patches with git and “git am”.

%setup -q
# Create a git repo within the expanded tarball.
git init
git config user.email "..."
git config user.name "..."
git add .
git commit -a -q -m "%{version} baseline."
# Apply all the patches on top.
git am %{patches}

The background is that there is a git repo somewhere else which stores the unpacked baseline parted tarball, plus patches (stored as commits) on top.

I assume that Peter exports the commits using git format-patch. At build time these are applied on top of the tarball using git am.

There are two clear advantages:

  • No need to have lots of duplicate %patch lines in the spec file.
  • git-am restores permissions and empty files properly, which regular patch does not do.

With libguestfs in RHEL 6 we have roughly 80 patches, so managing these patches is very tedious, and this will greatly simplify things.


Filed under Uncategorized

Half-baked ideas: classify git commits and rule-based branch building

For more half-baked ideas, see my ideas tag.

This idea is an evolution of the previous idea for git commit dependency analysis.

Let’s say we have a development branch and a stable branch. Lots of bug fixes, features and so on go into the development branch, but for stable users we want to construct a branch containing only the stable, well-tested, obvious bug fixes. (It’s no coincidence that this sounds a lot like the scheme we use in libguestfs, but it’s also the basic scheme that Red Hat use to construct RHEL.)

What you end up doing is manually classifying the commits from the development branch, and cherry-picking the ones which look safe back to your stable branch. And you probably have a bunch of rules about what you want in your stable branch, such as “must fix a customer bug” or “must have had more than X months of testing in Fedora”.

A better idea would be to annotate the development commits with labels:

0d6fd9e fuse: Fix getxattr, listxattr calls and add a regression te
  labels: bugfix, fuse
  depends: 4e529e0

4e529e0 fish: fuse: Add -m dev:mnt:opts to allow mount options to b
  labels: feature, fish, fuse

feaddb0 roadmap: Move QMP to 'beyond 1.10'.
  labels: documentation

b8724e2 Open release notes for version 1.10.0.
  labels: documentation

e751293 ruby: Don't segfault if callbacks throw exceptions (RHBZ#664
  labels: bugfix, ruby
  depends: 6a64114

a0e3b21 RHEL 5: Use mke4fs on RHEL 5 as replacement for mke2fs.
  labels: bugfix, rhel5
  depends: 227bea6

(Before anyone jumps in here, Markus Armbruster pointed out to me that git has a great feature called git notes that makes the labelling part rather easy).

Now you need a rule as to what you’re going to allow into your stable branch, eg:

labels.contains ('bugfix') &&
  forall c in depends: c is in stable

Now your stable branch can be constructed entirely algorithmically. In fact, making many stable branches each with a slightly different emphasis (“bug fixes only”, “doc changes only”, “well-tested new features”, etc.) can be done automatically and algorithmically just by having more rules.

Update Here is a reply from Johan Herland who is one of the authors of git-notes. Thanks Johan for giving me permission to reproduce this email.

IMHO, using notes like you outline in the blog post makes a lot of sense. Attaching text strings as notes to the relevant commit is exactly what git-notes were created for. Storing simple strings for labels, and commit SHA1s (use full 40-char SHA1 sums to prevent future collisions) for dependencies sounds like the best plan.

A couple of different ways to encode this:

A. Place labels in one notes ref (refs/notes/labels), and dependencies in another notes ref (refs/notes/deps). The format of notes is simply one label/dependency per line, e.g.:

  $ git notes --ref=labels show HEAD
  $ git notes --ref=deps show HEAD

B. Place labels and deps in the same notes ref, and use a simple email-style header format, e.g.:

  $ git notes --ref=foo show HEAD
  Labels: bugfix, fuse, ruby
  Dependencies: 4e529e0, b8724e2

Either format is extensible: In (A), if you add a new data type, simply add a new notes ref; in (B) simply add a new header field name.

Whatever you feel most comfortable parsing/maintaining is probably the best choice for you.


Filed under Uncategorized

Half-baked ideas: git commit dependency analysis

For more half-baked ideas, see my ideas tag

Consider these three git commits A, B and C:

As far as git’s linear history is concerned, they are three independent commits and git only records the parent relationship A -> B -> C.

But could I cherry pick just patch C, omitting A and B? git won’t tell me, but I can examine the patches themselves (A, B, C) and answer this question (the answer is no, since the change in C depends on both changes A and B being applied first). The real dependency tree looks like this:

  ^ ^
 /   \
A --> B

Many other real dependency trees could have been possible. With another choice of A, B and C these might have been completely independent of each other, or (A, B) might have to be applied together, with C being independent.

The half-baked idea is whether we can write an automatic tool which can untangle these dependencies from the raw git commits? (Or whether such a tool exists already, I cannot find one)

There would be one important practical use for such a tool. When cherry picking commits for the stable branch, I would like to know which previous commits that the commit I’m trying to apply depends on. This gives me extra information: I can decide that applying this commit is too disruptive — perhaps it depends on an earlier feature which I don’t want to add. I can decide to go back and apply the older commits, or that a manual backport is the best way.

The information you can derive from patches doesn’t tell the whole story. There are two particular problems, one revealed by the choice of patches A, B and C above. With a trivial change to B, it is possible to apply A and B independently. There is only a dependency A -> B because a little bit of the context of patch A “leaked” into patch B. It is also possible for two features to logically depend on each other, but not overlap in any way: Consider the case where you add a log collector and log file processor in separate commits. The log file processor might be completely useless without the log collector, but the commits could appear completely independent if you just examine the patches.


Filed under Uncategorized

Note on creating and pushing a git branch

Note to self on how we created the 1.6.0/1.7.0/stable-1.6 branch in git:

git tag [-s] -m 'version 1.6.0' 1.6.0 $sha1
git tag [-s] -m 'version 1.7.0' 1.7.0 $sha1
git branch stable-1.6 $sha1
git push origin tag 1.6.0 1.7.0 stable-1.6

(Thanks Jim Meyering for working out the details)


Filed under Uncategorized

New libguestfs stable versions

There’s a non-critical security bug in libguestfs which is fixed in development version 1.5.23.

I pushed two stable versions of libguestfs today: For 1.2.14 (source) I backported about 20 bug fixes using git cherry-pick. For 1.4.6 (source) there were a total of 30 commits backported.

Git makes backporting relatively simple, although git cherry-pick tends to fall over once branches diverge a lot. I would say for the 1.2 branch which diverged 7 months ago, roughly ⅔rds of the patches applied straightaway, which I think is pretty good considering how much the code has changed.

Cherry pick can’t however deal with file renames, so there were some patches that I had to edit and apply manually. Apparently there is something you can do with git merge to deal with that, but no one has explained that yet in a way I can understand.

It’s also interesting how a > 6 month old branch is still getting so many fixes!

Leave a comment

Filed under Uncategorized

git cherry-pick wins

git cherry-pick is yet another great/unknown feature of git. We use it in libguestfs to cherry pick the best bug fixes from the development branch into the stable branch. The whole process is quite effortless.

First I will list out all the main branch [ie. devel/unstable] commits which have happened since the last stable release, and I’ll look at each commit. The only difficulty is evaluating each commit to see whether it meets our criteria for a sufficiently safe change that users of our stable branch will want.

Secondly, I have our stable branch checked out in my working directory, and “git pull” to make sure that is up to date.

Then, for each commit I want to cherry pick, I simply do:

$ git cherry-pick -x sha1_of_the_commit

And usually that’s all I have to do. Git’s patch conflict resolution is much better than plain “patch”. It’s able to work minor miracles even where the code in the two branches has diverged quite a way. If it’s unable to apply the patch directly, then you’ll see a message like this:

Automatic cherry-pick failed.  After resolving the conflicts,
mark the corrected paths with 'git add <paths>' or 'git rm <paths>' and commit the result.
When commiting, use the option '-c 94e310d' to retain authorship and message.

This is fairly self-explanatory. Use “git status” to see which files are problematic:

$ git status
# On branch stable-1.2
# Your branch is ahead of 'origin/stable-1.2' by 15 commits.
# Changes to be committed:
#   (use "git reset HEAD <file>..." to unstage)
#	modified:   tools/virt-cat
#	modified:   tools/virt-edit
#	modified:   tools/virt-ls
#	modified:   tools/virt-tar
#	modified:   tools/virt-win-reg
# Unmerged paths:
#   (use "git reset HEAD <file>..." to unstage)
#   (use "git add <file>..." to mark resolution)
#	both modified:      perl/lib/Sys/Guestfs/Lib.pm

Edit the file to manually resolve the conflict, add the file, and then commit with the “-c” option noted in the original message.

The result is a series of stable commits like this, and hopefully some happy users.


Filed under Uncategorized

git: Splitting commits

git wonders never cease. This page describes how to use git rebase -i to split commits. (Thanks Chris Lalancette and Stephen Tweedie).

Stephen notes also: “git diff --cached isn’t mentioned in that git-rebase page, but it’s an invaulable part of the process”. Use git diff --cached during the splitting process to see which commits you’ve added to the index but not committed yet. This lets you see what your split commit will look like.

1 Comment

Filed under Uncategorized