Project Nayuki

Git library (Java)

This library reads and writes raw objects in a Git repository, giving a high degree of visibility and control to the application developer. The library is implemented in pure Java with no dependencies on tools or libraries, and is based on my own understanding of Git’s low-level storage format.

The library focuses on directly manipulating file data (blobs, trees) and history (commits). It does not deal with the working copy, index / staging area, or network communication.

Project source code at GitHub:

Supported features:

  • Immutable object IDs (SHA-1 hashes)
  • Mutable Git objects (blob, tree, commit, tag) with all fields
  • Parsing objects from byte arrays
  • Serializing objects to byte arrays
  • Reading and writing loose object files
  • Finding, reading, unchaining, and delta-decompressing objects in packfiles
  • Reading and writing loose reference files
  • Reading packed references
  • Searching object IDs by hexadecimal prefix
  • In-memory repositories with full functionality
  • Traversal of commit graph
  • Detailed Javadoc documentation comments
  • Defensive checks on input arguments and binary data

Unsupported features:

  • Reading packfile indexes
  • Writing packfiles and packfile indexes
  • Writing packed references
  • Reading from and writing to the index / staging area
  • Reading and writing the reflog
  • Reading and writing files and directories in the working copy
  • Reading and writing the list of remote repositories
  • Fetching from and pushing to remote repositories
  • Diffing and merging files, trees, and commits
  • Any convenient imitations of standard Git porcelain tools (git add/commit/status/checkout/reset/etc.)
  • Reading and writing cryptographic signatures (e.g. signed tags)

The list of unsupported features does not cover every conceivably wanted feature. However, the list can shrink as features get implemented.



Starts at the given branches (such as master), scans the entire history of ancestor commits, and prints a number of statistics about the commit graph:

  • Number of root commits (which have zero parents)
  • Number of fork commits (which have more than one child)
  • Number of merge commits (which have more than one parent)
  • Longest chain of commits (maximum number of commits encountered when starting at some commit and iteratively following a parent pointer)
  • Total number of commits (number of commits reachable from the starting points)

Starts at the master branch, scans the entire history of ancestor commits, scans the complete file tree of each commit, and prints a list representing the union of all paths ever used. For example if a file was located at /foo/x.cpp in one version and was moved to /bar/y.cpp in the next version, then both paths will be printed by this program.