You type git commit -m "fix bug" and it just... works. The files are saved, history is preserved, your team can pull your changes.

But what actually happened?

Most developers I know use Git every single day and have no idea what's going on under the hood. They know the commands. They know the workflow. But the internals? Total mystery. And honestly, that is fine. But that changes something breaks and you have no mental model to debug with.

So let's fix that today. Together.

This is Part 1 of a two-part series. We are going to crack open Git and look at its insides. And I promise you, once you see it, you will never look at git status the same way again.

Confused reaction, me before I understood what Git was actually doing

The .git Directory: Git's Entire Brain

Here is something most tutorials skip over completely. When you run git init in a folder, Git creates one hidden directory: .git/. That is it. That is Git. The entire version control system lives inside that one folder.

Let's look inside it:

.git/
├── HEAD
├── config
├── description
├── hooks/
├── index
├── info/
├── objects/
│   ├── info/
│   └── pack/
└── refs/
    ├── heads/
    └── tags/

Here is what matters:

objects/ is where Git stores every version of every file you have ever committed, plus commits themselves, plus directory snapshots. This is the heart of Git. We will spend most of this article here.

refs/ holds references, basically named pointers to commits. Your branches live here. Tags live here. We will get into this deeply in Part 2.

HEAD is a text file that tells Git which branch you are currently on. Open it right now and look:

cat .git/HEAD
# ref: refs/heads/main

Just a text file. One line. That is your "current branch."

index is the staging area, the place where git add puts things before they become a commit. It is a binary file that maps file paths to object hashes.

config holds your repository-level configuration (remote URLs, user settings for this repo, etc.).

That is the whole brain. Pretty approachable when you lay it out like this, right?

Git is a Content-Addressable Filesystem

This phrase sounds technical but the idea behind it is dead simple.

Imagine a library where books are not named by title. Instead, every book is named by a unique code that is mathematically derived from the book's exact contents. If two books have identical text, they get the same code. Change a single comma and the code is completely different.

That is exactly how Git stores data.

Every piece of content you give Git, whether that is a file, a directory snapshot, or a commit — gets hashed using SHA-1 (more on this shortly). The resulting 40-character hex string becomes both the name and the address of that piece of content in Git's object store.

This means:

The same content always produces the same hash. Always.
Different content always produces a different hash.
Git never loses data, because data is identified by what it is, not by where it lives.

You can try this right now. Create any text file and run:

echo "Hello, Git" | git hash-object --stdin
# 8ab686eafeb1f44702738c8b0f24f2567c36da6d

That 40-character string is the address of that content in Git's world. Run the same command on any machine, with any repo, and you will get the exact same hash, because the content is the same.

That is what "content-addressable" means. The address comes from the content itself.

The Four Object Types

Git's object store has exactly four types of objects. Every single thing Git tracks is one of these four.

Blob: Just the Content, Nothing Else

A blob stores the raw content of a file. Not the filename. Not the path. Just the bytes.

So if you have two different files with identical contents, Git stores only one blob. They share it.

Let's see a blob in action. After staging a file, you can find and inspect the blob Git created:

echo "Hello, World" > hello.txt
git add hello.txt

# Find the hash of our staged file
git ls-files --stage
# 100644 8ab686eafeb1f44702738c8b0f24f2567c36da6d 0    hello.txt

# Inspect the object type
git cat-file -t 8ab686eafeb1f44702738c8b0f24f2567c36da6d
# blob

# Inspect the content
git cat-file -p 8ab686eafeb1f44702738c8b0f24f2567c36da6d
# Hello, World

See that? The blob just contains Hello, World. No filename. No metadata. Pure content.

Tree: Git's Version of a Directory

A tree object maps filenames to blobs (and other trees for subdirectories). It is essentially a directory snapshot.

After your first commit, you can inspect the tree it points to:

# Get the commit hash
git log --oneline -1
# a1b2c3d Initial commit

# Look at the commit's tree
git cat-file -p a1b2c3d
# tree 92b8935ad25c3b4a11cd4beba1c3ac2a516a06a2
# author Dulitha Rajapaksha ...
# committer Dulitha Rajapaksha ...
#
# Initial commit

# Inspect the tree itself
git cat-file -p 92b8935ad25c3b4a11cd4beba1c3ac2a516a06a2
# 100644 blob 8ab686eafeb1f44702738c8b0f24f2567c36da6d    hello.txt

A tree entry looks like this:

[mode] [object-type] [hash]    [filename]
100644 blob           8ab686...  hello.txt
040000 tree           f3f1c8...  src/

The tree says: "In this directory, there is a file called hello.txt and its contents are the blob with this hash. And there is a subdirectory called src/ which is the tree with that hash."

Simple, elegant, recursive.

Commit: The Snapshot in Time

A commit object ties everything together. It points to:

A tree (the root directory snapshot at that moment)
A parent commit (the previous commit, so Git knows the history)
Author metadata (name, email, timestamp)
Committer metadata (can differ from author, for example when applying patches)
The commit message

Let's look at a real commit object:

git cat-file -p HEAD
# tree 92b8935ad25c3b4a11cd4beba1c3ac2a516a06a2
# parent f4e3d2c1b0a9...
# author Dulitha Rajapaksha <dulitha@example.com> 1708732800 +0530
# committer Dulitha Rajapaksha <dulitha@example.com> 1708732800 +0530
#
# Add hello.txt with greeting

Notice that a commit does not store a diff. It stores a full snapshot of the entire working tree via the tree object. Git is not a delta-based system at its core. It is a snapshot-based system.

This is why git checkout to an old commit is instant. Git is not replaying changes backwards. It is just loading the snapshot that commit points to.

Tag: A Named Pointer to a Commit

An annotated tag (created with git tag -a) is its own object type. It points to a commit and adds extra metadata: the tagger's name, a date, and a tag message.

git tag -a v1.0 -m "First release"
git cat-file -p v1.0
# object a1b2c3d4e5f6...
# type commit
# tag v1.0
# tagger Dulitha Rajapaksha <dulitha@example.com> 1708732800 +0530
#
# First release

Lightweight tags (created with just git tag v1.0) are not objects at all. They are just refs, plain text files pointing directly to a commit hash. Annotated tags are the "proper" ones with full provenance.

How Objects Are Stored on Disk

Now you know what the four object types are. But how does Git actually store them on your filesystem?

Go look inside .git/objects/ right now:

ls .git/objects/
# 8a/
# 92/
# a1/
# info/
# pack/

See those two-character folder names? Git takes the 40-character SHA-1 hash and splits it: the first 2 characters become the folder name, and the remaining 38 characters become the filename inside that folder.

So the blob with hash 8ab686eafeb1f44702738c8b0f24f2567c36da6d lives at:

.git/objects/8a/b686eafeb1f44702738c8b0f24f2567c36da6d

You can verify it:

ls .git/objects/8a/
# b686eafeb1f44702738c8b0f24f2567c36da6d

Why split it this way? Filesystem performance. If you have a project with 100,000 objects (entirely normal for a large codebase), putting them all in one folder would make directory operations painfully slow. Splitting by the first two hex characters gives you up to 256 possible subfolders, spreading the objects evenly.

The objects themselves are stored as zlib-compressed binary files. That is why you cannot just cat them directly. You need git cat-file to read them properly.

For large repos, Git also packs objects into .git/objects/pack/ files using delta compression. That is a whole other topic, but the core loose object model is what we covered here.

SHA-1 Hashing, Explained Simply

Everything in Git's object model depends on SHA-1 hashing, so let's make sure we actually understand what that means.

A hash function takes any input (a word, a file, a novel, a video) and produces a fixed-length output. For SHA-1, that output is always 40 hexadecimal characters.

A few key properties make this useful:

Deterministic: The same input always produces the same output. Always. You can hash the same file a million times and get the exact same 40 characters.

Avalanche effect: Change one single bit of input and the output is completely, unpredictably different. There is no way to look at two hashes and tell how similar the inputs were.

Practically collision-free: Two different inputs producing the same hash is theoretically possible but so astronomically unlikely that it has never happened accidentally in the wild for SHA-1. (Engineered collisions are a different story, which is why Git is moving to SHA-256.)

Here is the avalanche effect in action:

echo "Hello, Git" | git hash-object --stdin
# 8ab686eafeb1f44702738c8b0f24f2567c36da6d

echo "Hello, git" | git hash-object --stdin
# 06a1c49a2e8e9b08beb4c39e7e1cd07a56b89a31

Just changing the capital G to lowercase g produces a completely different hash. That is the avalanche effect at work.

Mind blown, the avalanche effect is kind of magical when you see it

Git also uses the hash as a checksum. When you fetch from a remote, Git hashes the received objects and checks them against the hashes in the transfer protocol. Any corruption or tampering is immediately detectable. Your history is cryptographically protected.

One more thing worth mentioning: Git is currently migrating from SHA-1 to SHA-256. SHA-1 has known weaknesses against engineered collision attacks (though exploiting them against Git in practice is extremely hard). The SHA-256 migration produces 64-character hashes and is already supported in recent Git versions. For day-to-day usage, SHA-1 repos are perfectly fine.

What Actually Happens When You git add

Okay, we have built up enough context. Let's trace exactly what happens when you run git add.

Take a fresh file:

echo "Learning Git internals" > notes.txt

At this point, Git knows nothing about this file. The object store has nothing for it. The index does not reference it.

Now run:

git add notes.txt

Two things happen, in this order:

Step 1: A blob object is created. Git reads the content of notes.txt, prepends a header (blob [byte-length]\0), hashes the whole thing with SHA-1, compresses it with zlib, and writes it to .git/objects/[xx]/[remaining-hash].

You can do this manually with git hash-object:

git hash-object notes.txt
# e69de29bb2d1d6434b8b29ae775ad8c2e48c5391

Step 2: The index is updated. The staging area (.git/index) is updated to map notes.txt to the blob hash we just created.

That is it. No commit exists yet. No tree exists yet. Just a blob sitting in the object store, and the index pointing to it.

Let's verify the blob was created:

ls .git/objects/e6/
# 9de29bb2d1d6434b8b29ae775ad8c2e48c5391

git cat-file -t e69de29bb2d1d6434b8b29ae775ad8c2e48c5391
# blob

git cat-file -p e69de29bb2d1d6434b8b29ae775ad8c2e48c5391
# Learning Git internals

When you run git commit, Git reads the index, builds a tree object from it (creating tree objects recursively for subdirectories), then creates a commit object pointing to that root tree and the previous commit. But the blob was already there, created at git add time.

This is why git add can be slow for large files or many files. The real work, hashing and compressing content, happens at add time, not commit time.

What's Coming in Part 2

So now you know how Git stores things. You understand blobs, trees, commits, and tags. You can inspect any object with git cat-file. You know why the objects folder is split the way it is.

But here is where it gets really interesting.

In Part 2, we are going to talk about how commits chain together to form history. We will look at what a branch actually is under the hood, and I think it will surprise you how embarrassingly simple it is. We will also walk through what git merge and git rebase are actually doing to those objects behind the scenes.

The mental model you are building right now is exactly what makes those concepts click. Stay tuned! 🚀

It works! The pieces are coming together

Chao!

How Git Works Internally: Behind the Curtain (Part 1)

The .git Directory: Git's Entire Brain

Git is a Content-Addressable Filesystem

The Four Object Types

Blob: Just the Content, Nothing Else

Tree: Git's Version of a Directory

Commit: The Snapshot in Time

Tag: A Named Pointer to a Commit

How Objects Are Stored on Disk

SHA-1 Hashing, Explained Simply

What Actually Happens When You git add

What's Coming in Part 2

Comments

More from this blog

Laravel Pipeline Rollback: Undo What Already Happened When a Pipe Fails

Laravel's Pipeline Pattern: The Hidden Gem You're Already Using (Without Knowing It)

Git Internals Demystified: Branches, Merges, and the Reflog (Part 2)

I Caught My AI Agents Lying to Me (And Built a System to Stop It)

Command Palette

The .git Directory: Git's Entire Brain

Git is a Content-Addressable Filesystem

The Four Object Types

Blob: Just the Content, Nothing Else

Tree: Git's Version of a Directory

Commit: The Snapshot in Time

Tag: A Named Pointer to a Commit

How Objects Are Stored on Disk

SHA-1 Hashing, Explained Simply

What Actually Happens When You git add

What's Coming in Part 2

Comments

More from this blog