Skip to main content

Command Palette

Search for a command to run...

Git Internals Demystified: Branches, Merges, and the Reflog (Part 2)

Coffee Chat - Episode 8

Updated
13 min read
Git Internals Demystified: Branches, Merges, and the Reflog (Part 2)

In Part 1, we cracked open the .git directory and found blobs, trees, SHA-1 hashes, and the index. You saw that git add does not talk to any server, it just writes objects locally. Good stuff.

Now things get interesting. Let's talk about how Git actually tracks history, and why branches are one of the biggest lies in software development. Not lies in a bad way. They are just far simpler than most people think.

What a Commit Really Is

We touched on this briefly in Part 1, but now we are going to go deeper. A commit object is deceptively simple. It contains exactly three things:

  1. A pointer to a tree object (the snapshot of your project at that moment)

  2. A pointer to the parent commit (or two parents if it is a merge commit)

  3. Metadata: author, committer, timestamp, and the commit message

That is it. There is no diff stored in a commit. No line-by-line change tracking. Just a full snapshot of your project tree, plus a link back to where you came from.

So your project history is not a sequence of diffs. It is a linked list of snapshots.

Commit C  -->  Commit B  -->  Commit A  -->  (root, no parent)
   |               |               |
  tree C          tree B          tree A

Every commit knows its parent. Commit A is the first commit (no parent). Commit B points back to A. Commit C points back to B. That chain is your entire Git history.

When you run git log, Git starts at the latest commit and follows those parent pointers backwards. That is literally all it is doing.

The beautiful thing here is that two commits can share parts of the same tree. If you only changed one file, the new commit's tree reuses all the unchanged blob objects from the previous tree. Git is efficient like that.

Branches Are Just Text Files

This is the one that genuinely surprises people. Run this in any Git repo:

cat .git/refs/heads/main

You will see something like:

a3f8b1c2d4e5f6a7b8c9d0e1f2a3b4c5d6e7f8a9

That is it. A branch is a file with a 40-character commit hash inside it. Nothing more.

Creating a new branch? Git writes a new file in .git/refs/heads/ with the current commit hash.

Deleting a branch? Git deletes that file.

Switching branches? Git updates .git/HEAD to point to a different file.

Mind blown reaction, because a branch is literally just a text file with a hash

So when your teammate says "I just created a feature branch," all they did was write a 40-character string into a new file. The code did not move anywhere. No files were copied. Nothing was duplicated.

Think of it like a sticky note. Your codebase is a city. A branch is just a sticky note on your desk that says "I am currently looking at intersection A3F8B1C2." When you create a new branch, you grab a second sticky note and write the same address on it. Two notes, same location, zero duplication.

This also explains why branches in Git are so cheap to create and delete, unlike older version control systems that actually copied entire directory trees when you branched.

HEAD: The Pointer to the Pointer

Every Git repo has a special file at .git/HEAD. Take a look:

cat .git/HEAD

On a normal working branch, you will see:

ref: refs/heads/main

HEAD is not pointing at a commit directly. It is pointing at a branch name. HEAD says: "I am on the main branch. Whatever commit main points to, that is where I am."

This is called a symbolic ref. HEAD knows the branch, and the branch knows the commit.

Now, here is where people get confused. Run git checkout on a specific commit hash instead of a branch name:

git checkout a3f8b1c2

Now look at .git/HEAD:

a3f8b1c2d4e5f6a7b8c9d0e1f2a3b4c5d6e7f8a9

HEAD now points directly to a commit hash, not to a branch. This is called detached HEAD state. Git even warns you about it with that slightly alarming message: "You are in 'detached HEAD' state."

Confused developer reading "detached HEAD state" for the first time

Do not panic. All it means is: you are looking at a commit that is not the tip of any branch. If you make new commits from here, those commits will not be attached to any branch. They will just be floating in history with no branch pointing to them.

This is why Git warns you. If you switch away without creating a branch first, those commits become unreachable. Well, mostly unreachable. We will cover how to recover them in the reflog section.

Detached HEAD typically happens when you:

  • Check out a specific commit hash directly

  • Check out a tag

  • Land in the middle of an interactive rebase

To get back to safety, just run git checkout main (or whatever your branch is called) and HEAD becomes a symbolic ref again.

What Actually Happens When You git commit

This is the full picture. Let's walk through every step that fires when you run git commit -m "Add payment processing".

Step 1: Build blob objects from the index. The index (.git/index) holds the staged files. Git takes each file and writes a blob object for it into .git/objects/. Files that have not changed reuse their existing blob objects.

Step 2: Build tree objects. Git assembles a tree object that represents your project's directory structure, pointing to all the blob objects for your files.

Step 3: Create the commit object. Git creates a new commit object that contains: the SHA of the new tree, the SHA of the current HEAD commit as the parent, and your metadata (author, message, timestamp).

Step 4: Update the branch ref. The current branch file in .git/refs/heads/ gets updated. It now contains the hash of the new commit.

Step 5: HEAD stays the same. HEAD still says ref: refs/heads/main. But now main points to your new commit instead of the previous one.

Visually:

BEFORE commit:
HEAD --> main --> [commit B]

AFTER commit:
HEAD --> main --> [commit C] --> [commit B]

HEAD did not move. The branch moved. HEAD follows because it points to the branch.

This is a really satisfying thing to understand. When you commit, you are not "saving to HEAD." You are extending the branch one commit forward, and HEAD trails along automatically.

Merging Under the Hood

Merges are actually two very different operations depending on the situation. Git decides which to use based on the history.

Fast-Forward Merge

Say you are on main and you want to merge in feature/payment. But here is the thing: since you created feature/payment, nobody has committed anything new to main. Your branch history looks like this:

main:           A --> B
feature/payment:      B --> C --> D

main is directly behind feature/payment. There is a straight path from B to D with no diverging. Git does not need to create a merge commit here. It just moves the main pointer forward to D.

AFTER fast-forward:
main:           A --> B --> C --> D

That is a fast-forward merge. The branch pointer just slides forward. No new commit is created. Clean, linear history.

Three-Way Merge

Now say main has had new commits while you were working on your feature:

main:           A --> B --> E --> F
feature/payment: A --> B --> C --> D

Now there are two diverging lines of development. Git cannot just slide a pointer. It needs to actually combine two different sets of changes.

Git finds the common ancestor (commit B in this case) and then does a three-way comparison:

  • What changed between B and F? (changes on main)

  • What changed between B and D? (changes on the feature branch)

  • Combine both sets of changes into a single new commit

The result is a merge commit with two parents:

A --> B --> E --> F --> M (merge commit)
              \       /
               C --> D

Commit M is special. Its commit object has two parent hashes instead of one. That is how Git encodes the fact that history converged here.

If the same lines were changed differently on both branches, you get a merge conflict. Git cannot automatically decide which change wins, so it stops and asks you.

The three-way part of "three-way merge" refers to the three commits involved: the common ancestor, the tip of branch A, and the tip of branch B. That ancestor is the key. Without it, Git would have no baseline to compare against.

Rebase: The Time Machine

Let's use the same diverging history from above:

main:           A --> B --> E --> F
feature/payment: A --> B --> C --> D

Instead of merging and creating commit M, rebase takes a different approach. It says: "What if I pretended I started my feature branch from F instead of B?"

Rebase replays your commits (C and D) on top of the new base (F):

AFTER rebase:
main:           A --> B --> E --> F
feature/payment:               F --> C' --> D'

Notice the ' marks. C' and D' are new commit objects. They have new SHA hashes. The original C and D still exist in the object store until they get garbage collected.

This is the critical thing to understand about rebase: it rewrites history. The changes are the same, but the commits are completely new objects with new identities.

This has two important consequences.

First, the resulting history is clean and linear. When you merge feature/payment back to main, it will be a fast-forward (no merge commit). Your log stays readable.

Second, you should not rebase commits that have already been pushed to a shared branch. If your teammate has built work on top of your original C and D commits, and you rebase them into C' and D', their history and your history have now diverged. Things get messy.

A good rule of thumb:

  • Rebase your local, unpushed commits to clean them up before opening a pull request

  • Merge when integrating finished, shared work back into main

Neither is always better. They serve different purposes.

The Reflog: Git's Safety Net

This is probably the most underused feature in Git, and also the most reassuring one to know about.

Every time HEAD moves, Git records it. Every commit, every checkout, every reset, every rebase, every merge. Git logs it in .git/logs/HEAD. This is the reflog (reference log).

Run this:

git reflog

You will see something like:

a3f8b1c HEAD@{0}: commit: Add payment processing
7e2d9f4 HEAD@{1}: checkout: moving from feature/payment to main
3c1a8b5 HEAD@{2}: commit: Add cart total calculation
b5f9e23 HEAD@{3}: commit: Initial cart setup

Every move. Every single one.

So when a junior dev runs git reset --hard HEAD~3 by accident and panics because three commits just vanished? The reflog has those commits. You can get them back:

# Find the commit hash in reflog
git reflog

# Restore it
git checkout -b recovery-branch HEAD@{2}

Or if you deleted a branch and realized you needed it:

git reflog
git checkout -b restored-branch a3f8b1c

You almost cannot lose work in Git if you know about the reflog. The commits are still in .git/objects/. The reflog tells you where they are.

Relief, knowing the reflog has your back

The reflog is local. It does not sync to remote. And it does expire (default 90 days for reachable commits, 30 days for unreachable ones). But within that window, you are almost always safe.

Garbage Collection: The Cleanup Crew

Here is why "deleted" commits hang around long enough for reflog to save them.

When you delete a branch or run a hard reset, Git does not immediately delete the underlying objects. It just removes the references pointing to them. The blobs, trees, and commit objects stay in .git/objects/ until Git runs garbage collection.

Git periodically runs git gc automatically. This command:

  1. Scans all objects in .git/objects/

  2. Finds any objects with no references pointing to them (no branch, no tag, nothing in reflog)

  3. Deletes those unreachable objects

  4. Packs remaining objects into pack files for efficiency

You can trigger it manually:

git gc

But in practice you rarely need to. Git handles it automatically and conservatively.

The practical takeaway: if you deleted a branch five minutes ago, the objects are almost certainly still on disk. Use the reflog, find the hash, create a new branch pointing to it, and you are back.

This is why the right mental model for Git is not "deleting things removes them." It is more like: "deleting things removes the label. The actual data stays until the cleanup crew arrives."

What You Now Know

Let's take a breath and look at how far you have come across both parts of this series.

From Part 1, you know:

  • Every file, directory snapshot, and commit is stored as an object in .git/objects/

  • Objects are identified by SHA-1 hashes of their content

  • git add writes blobs to the object store and updates the index

  • The four object types: blob, tree, commit, tag

From Part 2, you now know:

  • A commit is a pointer to a tree, a pointer to its parent, and some metadata

  • A branch is a 40-character file containing a commit hash

  • HEAD is either a symbolic ref to a branch or a direct commit hash (detached)

  • git commit builds objects, creates a commit, and moves the branch pointer forward

  • Fast-forward merge just slides a pointer. Three-way merge creates a new commit with two parents

  • Rebase replays commits on a new base, creating new commit objects with new SHAs

  • The reflog is a local log of every HEAD movement

  • Deleted objects are not immediately removed. Garbage collection handles that later

Git is not magic. It is four object types, some files in .git/refs/, a HEAD file, and a really elegant set of operations built on top of them. When something goes wrong, you now know exactly where to look.

Next time you see "detached HEAD state" or accidentally delete a branch or panic about a bad reset, stop and think: what objects exist? What is HEAD pointing to? What does the reflog say? The answer is almost always in there.

Go open a test repo and poke around. Run cat .git/HEAD. Run cat .git/refs/heads/main. Run git reflog. See the machine underneath the magic.

Once you see it, you cannot unsee it. And honestly, Git becomes a lot less scary after that. 🚀

Celebration, you just understood Git internals

Happy coding!

Git Internals: How Git Works Under the Hood (Part 2)