Bazel and Remote Caching

Lately I have spent some time playing with remote caching of Bazel build artifacts. I am still exploring this, but here is a quick summary of what I’ve learned so far.

Caching

Incremental builds is one of the major advantages of Bazel. “Incremental” means that Bazel will only rebuild the parts of the source code that is relevant to a change. This is great for build performance since source code changes often happen in small increments.

There is one notable exception.

When you pull the source for the first time, you have to build the entire codebase. Depending on the size of the project, the initial build can take a very long time.

This is where remote caching of build artifacts come into play. By caching artifacts from past builds you can avoid rebuilding source code unnecessarily.

One potential setup is to cache builds on the CI server since it usually triggers a build after every commit anyway. This makes the CI server a good candidate for creating the Bazel remote cache.

Before doing a build on a local dev machine, Bazel will first check the remote cache to see if you can skip all, or parts of the local build.

Caching Server

In my experiment I created the caching server using nginx with the webdav protocol. Webdav is basically a protocol for managing files.

The process responsible for building the remote cache will add files to the cache through PUT requests.

Build nodes can then make GET requests to the caching server to pull cached artifacts.

Nginx is fast, but you also need a fast network connection. Otherwise, network latency will likely negate any benefit of the cache.

Demo Application

The demo application for this experiment is a decent size Angular application. During local testing the initial build time is around 5 minutes.

To make it easier to spin up the application I am hosting both the caching server and the web app in docker.

If you are interested in trying it out you can pull the code from Github.

The code is in a branch called caching.

To spin up the entire application, run npm run start-server. This will build and launch the docker containers.

The initial build will take a while since docker has to download a lot of dependencies and build the remote Bazel cache.

During the initial Bazel build you will see a cascade of PUT requests to the cache. This is just the initial build populating the cache.

Once the build completes you can load the application in a browser by going to http://localhost:9001.

Testing the Cache

To test the cache, just add another airplane container to docker-compose.yml. Just duplicate the existing one and name it airplane2.

Next restart the server and watch as the new container is spinning up.

You should now see a bunch of GET requests with status 200 as the new instance is being built. The 200 GET requests represent cache hits, which means Bazel won’t have to rebuild those artifacts.

When I ran this experiment I noticed a build time of around 30 seconds, which is a huge improvement over the original 5 min build time.

After the build completes there is also a cache hit statistics telling us how much we were able to utilize the cache.

Challenges

Here are a few challenges I am still trying to figure out:

Since I am running in docker, I can control the environment 100% in terms of OS and folder structure. As I spin up new duplicate containers I see a near perfect cache hit ratio.

However, I am struggling to get the same results when hitting a centralized cache from different standalone computers.

Each computer shares the same caching server, but I am not getting cache hits between the nodes.

Instead, each node builds an independent cache. This defeats much of the purpose since you don’t get the benefit of downloading a pre-built cache.

I am still trying to understand why builds on different machines miss the cache even though the same commit is being built.