Partial commits with git
February 3, 2010
I've been using git for a few months now. Like all the best dev tools it has slowly permeated my workflow to the point where I'd feel pretty crippled without it. In fact, I don't think I would ever go back to a centralized system like svn at this point. In a pinch I'd fall back to using git locally and then using the appropriate git-to-foo integration to push my local changes from my git repo to whatever the centralized repository was.
So, git is great. However, there's one practice which is common amongst git users which I am still uncomfortable with (even though I do it myself on occasion). I'm referring to the practice of staging some, but not all, of the changes in your working tree to the index, and then committing that partial change set. For example, let's say I've been hacking away on my stamp collector application. I added the ability to 'tag' stamps (folksonomy now being mainstream enough to appeal to your average philatelist). While I was working, I also fixed a couple of badly named methods that I happened to come across during my other changes. With git I can decide to commit these changes separately, through judicious use of the index. I can say 'add changes to files x and y to the index, but leave out z for now', then commit, then add the changes to file z to the index, and then commit that change. If I want to get really fancy I can even add some of the changes to a file to the index, but not all changes. This way I can seperate my changes into two logical commits, one for the feature addition of tagging stamps, and one for the miscellaneous method name cleanup.
This is definitely one of those features that is very cool once you realize the power it offers. It means someone else reviewing my commits will have an easier time, and it even means that I could roll back my feature addition change but still keep the method name cleanup work. Still, I would submit that this can be a Bad Idea.
Why does this concern me? Because any such partial commits haven't been tested before being commited. At this point you might be thinking "Well, don't know about you, but I run my tests before each commit". The point here is that the tests run before you commit are run against your working tree, which in the world of git isn't necessarily the same as your commit. In the Good Old Days of svn, the changeset in your working tree was always the same as the changeset in your commit. With git and the index (aka the stage, aka the cache, don't get me started on that) that's not the case. Your working tree may contain changes to files x, y and z, but you've decided to only commit the changes to files x and y. Or even more extreme, you've decided to commit the changes to file x but only some of the changes to file y (maybe the other changes where related to your method renaming binge). So you're running your tests against one set of code (your working tree, with changes to x, y, and z), but your commit is another set of code. I think that ninety nine times out of a hundred this distinction won't matter. Your "technically untested" commit would still have passed the tests anyway. Plus, what you're almost certainly following it up straight away with another commit which will be the same as your working tree. What are the chances that someone will manage to stumble upon an invalid state like that. Probably really low. But still, kinda scary I think. It seems a bit like saying "well, this race condition is really unlikely to happen, right...". I'd rather not have to even think about that. Another thing to consider is the 'cool' use case that someone could decide to revert one commit but not the other.At that point you have code sitting on head that hasn't been tested.
One solution to this would be to ban the use of commits that don't include all changes in the working tree. Yeah, right. Not likely, and not helpful. Staging is definitely a useful way to add more structure to your change history, and not something I'd like to leave behind.
I wonder if a better solution could be achieved with the judicious use of commit hooks. What if we had a pre-commit script that would take the change set that was about to be committed (NOT whatever is in the working tree) and run tests against that tree. If the tests fail the commit fails. Now we would be left in a place of greater confidence than before. We wouldn't even need the discipline to run the tests by hand, because even if we forgot then git would be running them for us anyway. To be honest I'm not sure of the feasibility of creating this kind of setup. I'd love to find out more about how feasible it would be.
Option number 3 would be to have a CI pipeline such that you don't ever commit changes to a branch that others can touch until the CI system has had a chance to run tests against your commit. Instead, you would always be commiting changes to a private branch. The CI system would detect your commit, run a build, and then merge your changes into a shared branch if and only if your commit passed muster. I don't think this would prevent commits in the middle of a batch of commits being pushed to your private branch from bein un-tested, but it would prevent the system ever getting into a state where the shared head is untested. This is an idea I'm planning to blog about more at some point in the future.
In conclusion, while I find myself sometimes taking advantage of this powerful feature of git, I always feel a little nervous doing so, and try and take extra care.