Git Coin Project Maintainer Consensus Protocol
gitcoin: the author of the commit sha1 with the longest prefix of 0’s in your repository is now the project maintainer
The genius in the tweet is how it draws a comparison to Bitcoin’s approach to achieving distributed consensus with achieving consensus on choosing a project maintainer.
With Bitcoin, there’s a proof-of-work algorithm that relies on generating SHAs until you find one with a certain number of leading zeros. Git commit SHAs could perhaps serve a similar purpose.
Is this a good way to pick a project maintainer? Probably not. But then again, it’s not that far off from how I make most important life decisions. If your project wants to take a walk on the wild side, I’ve got just the command for you.
A simple solution
Run the following command in a Git repository and it’ll return the name of the author, the commit date, and the SHA of the commit that has the lowest SHA sorted lexicographically.
git log --pretty=format:"%H %ad %an" | sort | head -n1
Or, if you prefer a Git alias:
[alias] coin = !git log --pretty=format:'%H %ad %an' | sort | head -n1
The SHA that results will have the most leading zeros in the repository. There may be other commits with the same number of leading zeros, but for the sake of this thought exercise, I’ll just pick the one that’s sorted first.
For those not familiar with the
command, there are a
gaggle of options
. I’ll break down this specific invocation.
option takes a custom format string that specifies the contents of the output.
is the commit SHA.
is the commit date and
is the author name.
We pipe that to the
command. Since no two SHAs can be the same, we don’t have to worry about sorting on just the first column. We can just sort using the entire line as the sort key. Then we use
to pluck the first item.
It’s possible that there won’t be any commits with leading 0s, but I ignore that for now. I figure the commit with the lowest SHA sorted alphabetically fits with the spirit of the idea.
Since github.com runs on Ruby on Rails, I thought it’d be fun to try it out on
. I cloned the repository to my machine and ran
on it. Here’s the output (SHA truncated for presentation purposes):
e8 Thu Nov 15 23:10:45 2012 +0200 Agis Anastasopoulos
Congratulations Agis! You are the new maintainer of Rails!
Not so fast!
I know what some of you are thinking, “You are ridiculous. This is a waste of time.” To those I say, hold my beer because I’m not done yet.
Others who are familiar with Bitcoin’s consensus protocol are thinking, “This is not how the protocol works. It’s not about choosing the lowest sorted SHA, it’s about reaching a target number of leading zeros.” To those I say, you’re taking this too seriously!
Even so, in anticipation of all the “Well, Actually” responses I’m sure to receive, I’ll address this fair point.
With Bitcoin, the first miner to generate a SHA with the target number of leading zeros is the one to add their block to the global blockchain.
I’m hand waving a bit here for the sake of brevity. The important point is that it’s not the block with the lowest SHA. It has nothing to do with the sorting of SHAs.
Over time, the protocol compensates for the global increase in computing power by increasing the number of leading zeros in the proof of work target. That way a block is added roughly every ten minutes no matter how fast computers get and no matter how many computers are mining.
If we translate this to the gitcoin idea, we probably want to look at the first commit to reach the most number of leading zeros.
For example, say that the current maintainer was chosen because of a commit with a SHA that has two leading zeros. The next maintainer is chosen by the commit that has three (or more) leading zeros. The next maintainer after that is chosen by the first commit with one more leading zero than the commit that chose the previous maintainer. And so on.
In other words, every time a new maintainer is chosen by this protocol, the target number of leading zeros increases by one. The implication is that over time, maintainers chosen by this project will spend more and more time maintaining the project. Not sure that’s necessarily a desirable trait or not.
The shell script to find the maintainer with these rules is considerably more complex than my previous script. This is why I originally wanted to stop with that script and call it a day. Also, I’m lazy.
Not to mention, my background is primarily with Windows so my Unix-fu is fairly weak. However, my time at GitHub working with Git has helped me exercise those muscles quite a bit more than I did before. So I thought it’d be fun to give it a shot. Here’s the script I came up with:
TZ=UTC git log --pretty=format:'%H%x09%ad%x09%an' --date=iso-local | grep ^0.* | sed -E 's/(0+)(.*)/\1\t\1\2/' | sort -k1,1 -k3,3r | tail -n1 | cut -f 2,3,4
And the award goes to…
When I run this against the Rails repository, it outputs (again, SHA truncated for presentation purposes):
dfe 2006-04-09 21:27:32 +0000 David Heinemeier Hansson
Sorry Agis, this David Heinemeier Hansson person is now the Rails maintainer! I hope David accepts this responsibility seriously.
Uh, still not there.
If you read an earlier version of this post, you’ll note I declared DHH the maintainer of Rails. But Jean-Jacques Lafay noted in the comments to this post that I need to look at the leading zeros of the SHA when written in binary form . Whoops!
This makes a lot of sense, when you think about it. Under my original implementation, every time we choose a new maintainer, we increased the difficulty in choosing the next maintainer by 16 times. When we look at leading zeros in binary form, we only double the time.
Fortunately, the correction to my script is pretty simple, I need to grab all the zeros (if any) and the first non-zero character when creating the sort key. Any characters after that won’t change the leading zeros.
have the same number of leading zeros when expressed as binary. But
do not have the same number of leading zeros.
So here’s the updated script:
TZ=UTC git log --pretty=format:'%H%x09%ad%x09%an' --date=iso-local | sed -E 's/^(0*[1-9a-f])(.*)/\1\t\1\2/' | sort -k1,1 -k3,3 | head -n1 | cut -f 2,3,4
And once again, Agis is the new maintainer for Rails!
The excrutiatingly detailed breakdown
Let’s break this down piece by piece for those of you like me who don’t eat and breath shell scripting.
The first thing we do is set the local timezone to UTC (
) so we can sort by date and compare apples to apples.
git log --pretty=format:'%H%x09%ad%x09%an' --date=iso-local
Just like before, we’re running a git log command. It looks ugly, but all I’m doing here is using tab character
in place of spaces. That’ll come
in handy later. I also specify that the date format should be
. This provides a date that’s sortable lexicographically. We’ll need that later too.
sed -E 's/^(0*[1-9a-f])(.*)/\1\t\1\2/'
Sed is a powerful command used to perform text transformations on an input stream. In this case, we’re using the
command which is a regex replacement. The
indicates that sed should use extended regular expressions. What I’m doing here is extracting the consecutive sequence of
s that the SHA starts with as a new column in the output.
So if the
command we ran earlier returned something like this (SHAs and name truncated for presentation purposes):
e1 2004-12-01 13:59:16 +0000 David daa29ec 2004-12-01 13:18:51 +0000 David a2249e 2004-11-26 02:16:05 +0000 David
Piping this output to this sed expression results in (name truncated for brevity):
71e1 2004-12-01 13:59:16 +0000 David d 0daa29ec 2004-12-01 13:18:51 +0000 David 08a2249e 2004-11-26 02:16:05 +0000 David
That format is pretty handy because we can sort this by the first column. This sorts commits from those with the most leading zeros to the least.
This will also group all SHAs with the same number of leading zeros together. Then we can sort by the date column to find the first commit in any such group.
sort -k1,1 -k3,3
Does exactly that. One thing that tripped me up when I first worked on this is I thought I should be able to
sort -k1 -k3
option specifies a sort key. By default, when you specify a column, it takes that column and all columns after it as the sort key. Thus
is pretty much equivalent to not specifying a sort key at all as it sorts by the whole line.
Fortunately, you can specify an end column for the sort key using the comma. So
sorts just by the first column. Whereas
would take the first three columns as a sort key.
Now that we have the proper sort in play, we just need to take the first entry. this is the oldest commit with the most leading zeros.
cut -f 2,3,4
And finally, we don’t need the leading zeros column in the final output so I run the
command and only keep columns 2, 3, and 4. This is where inserting the tabs before comes in handy. By default,
uses the tab character as a delimiter.