Pages

Git Bisect: divide and conquer search algorithm

Have you heard about git-bisect? Well, if you haven’t, let me tell you that this command is a powerful GIT debug tool to find a change which has introduced a bug somewhere in the history of a GIT tree. The power of this command resides in the fact that it uses a binary search algorithm to locate the position of the element in a delimited change set. In my opinion, “git bisect” is one of the best feature that the GIT version control offers.

How to use “git bisect”?

To see the git-bisect manual page click here.
Git-bisect will drive a binary search to find the version where the bug was introduced.
To find more help about this tool you can type the following in command line:
$ git bisect help
To use this tool you need to identify two points of reference, a good revision and a bad revision, then you will provide the commits IDs for each of the “rev” values listed below:
git bisect start [<bad> [<good>...]] [--] [<paths>...]
git bisect bad [<rev>]
git bisect good [<rev>...]
<Run your tests and evaluate. Repeat these steps to narrow down which of the commits was the problem>
Other useful subcommands are the following:
$ git bisect skip [(<rev>|<range>)...]
$git bisect reset [<commit>]
$ git bisect visualize
$ git bisect replay <logfile>
$ git bisect log
If you really want to automate this process you may want to explore more about the following subcommand:
$ git bisect run <cmd>...

How “git bisect” works?

Let’s considered a change set as an array R of N elements with an attribute Commit. Elements 1 to N ordered on the position of Commit, such that:


A and B will be our boundaries where A=<good commit>=0 and B=<bad commit>=N+1

The following flowchart represents the process follow by git-bisect:



When can I use git-bisect?

Git-bisect can be used in any GIT tree, but you need to consider the complexity of the same one in order to select the best debug strategy to maximize the benefits of this command. A single well-defined change is the best scenario where git-bisect can show all its potential.


But when the change set to be git-bisected is the result of the interaction of two or more branches, the debugging process will be less straightforward than debugging a single branch (well-defined change set), and this gets worse when you have a N-way merge like an octopus-merge, causing git-bisect to be less efficient. Don’t forget that git bisect at least will pinpoint to the merge that is causing the problem.






In the following examples we can appreciate a 6-way merge (A) and an octopus merge (B):



How many git-bisect iterations before knowing the “bad” revision?

As we previously mentioned, git-bisect is a binary search algorithm applied to an N list of elements (commits). So, after the first iteration there will be at most N/2 elements to probe, in a second iteration there will at most N/4 elements remaining, then N/8 and so on. The total of iteration is given by the following equation:


In the worst scenario, git-bisect will continue iterating until there are no more elements to probe, this will take at most floor(Log2(N)+1) = Log2(N)+1 , just one more iteration than expected.

The following graph shows the behavior of the logarithmic algorithm to find the number of iterations according to the numbers of elements (commits) to be git-bisected. As an example, the total iterations in 128 and 254 commits will be the same according to the floor function normalization



Let’s remember that git-bisect will perform the bisection in a total of commit IDs calculated as follows:


How to learn git-bisect: practicing!!

The theory behind git-bisect is really interesting, but the real value is to know how to use it and that will come only by practicing. This tool has proved to be very useful to find bugs, but the power of this command should not be limited only for that purpose. There are still many aspects that can be improved in this algorithm like the bisection of an octopus merge or any N-way merge for N > 3 of course, but these aspects are nothing compared with the fact that it has helped to minimize the time of debugging of thousands of projects.

No comments:

Post a Comment