Taking Control of Your Git History:
The Importance of Git History

Published on

Parts of this series:

  1. The Importance of Git History
  2. Staging Accurately
  3. Editing History
  4. Resolving Conflicts

Git is an indispensable tool for recording the history of our source code. This history increases in value the older that project gets; it is a unique archive of collaboration and hard work that describes how the project became what it is today. I wanted to write about the short and long term value of committing code, but I don’t want this series of posts to be just another set of Git commands, I want to explain why we’re doing this before we start geeking out.

Let’s start by examining the quality of the development documentation in our project, and by “development documentation” I mean the one that’s intended for people developing that project. Chances are that it’s not so great, and that’s ok! It’s hard to maintain a good development documentation, to me it even feels somewhat unnatural. I trust code more than I trust paragraphs, and I strongly believe that Git history can and should be used as a part of documentation.

Our work often requires us to understand unfamiliar code; perhaps we’re trying to resolve a merge conflict, or we’re experiencing a bug that we traced back to its source. Most of the times the unfamiliarity is due to the fact that it’s someone else’s code, but sometimes it can even be our own code from years ago. Either way, we don’t want to break anything, we just want to solve our problem. However, that code might be difficult to reason with, so we want to familiarize ourselves with it first. Our questions can usually be narrowed down to the following two:

  1. What does this code do? A good indicator of health of the codebase is our ability to answer this question by analyzing the present state of the codebase; the way the code is being used, names of files, classes, functions, variables, etc. But sometimes we can’t, and then we have to look elsewhere, like commit messages, pull requests, related discussions etc.

  2. Why does it look like this? Answering this question is trickier, it usually requires exploring that code’s history; that way we might get a good sense of how it evolved over time, and what makes it tick. However, we might not be able to do that if the changes have not been committed with the intention of being easy to understand for anyone else.

We can infer a lot just from commits alone:

  • A commit’s subject can give us a useful hint about what it does, but might not if it doesn’t say anything useful.
  • A subject like change README.md is useless because we already know as much by looking at the diff.
  • A commit’s body can give us a great explanation, but might not if it doesn’t exist.
  • A set of changes alone can give us a sense of what their purpose is, but might not if there are too many of them, achieving multiple goals at once, or if closely related changes are split across multiple commits. Ideally every commit should be able to stand on its own.

Every line of code is always documented, but the quality of that documentation is up to us. I highly recommend reading this thread about writing commit messages:

These are some of the common responses I’ve seen:

I recommend writing down the reason for the change in the code comments instead.

I’ve seen this one multiple times, and I can’t wrap my head around it. Commit message works on a different level than a code comment; commit messages describe changes, where code comments describe units of code. Commit messages can also include changes to code comments, but that’s coincidental and not always applicable (e.g. bugfix). Commit messages describe what happened, code comments describe what is. I don’t know how to explain it better than that, I hope you get it. 😉

Long commit messages make it hard to skim commits in git log.

You can customize the output of git log however you want, e.g. to scroll through commit subjects you can run:

git log --pretty=oneline --abbrev-commit

People usually find their favorite git log format and save it as an alias.

It’s better to just link to a PR or a Jira/Trello ticket.

Those links are great additions to the commit message, but should by no means be the only content! Commit messages are meant for developers, who shouldn’t have to understand the entire business context of our changes in order to know how to handle them, just like PM and QA shouldn’t have to understand every technicality to know how to manage and test our changes. The audiences are simply different.

I’m aware that being descriptive can pose a challenge for those of us who aren’t used to it; amending commits, manually editing hunks to stage, resetting, rebasing… all of this can be hard to grasp, but I want to show an example of how powerful Git can be if people take their time to commit changes well:

git log -p -S analyzeUsers

This goes through the entire Git history, finds every commit that contains analyzeUsers in its diff, and displays them along with their diffs. If you were the one developing analyzeUsers, and a few years later someone runs that command, wanting to understand its evolution over time, what will they find? 😉

I recommend watching this great talk to see a real-world example of how easy it is to make a mistake even with the simplest change if we’re missing important information in the commit message. The talk is very endearing and easy to follow:

If I managed to convince you that this is really important, one thing that could still be holding you back is if you don’t feel fluent in Git. Don’t worry, there’s no shame in that. This is exactly why I wrote this series, so you can learn how to start improving your Git history, starting with accurately staging your changes.

Educating our colleagues should be a part of making changes to the codebase, otherwise we become the only experts on our changes, and nobody else can join that club without our permission.