Back to website.

Near-Instant Dependency Restoration in GitHub Actions with SquashFS and FUSE

Iulian-Constantin Marcu
Iulian-Constantin Marcu
Cover Image for Near-Instant Dependency Restoration in GitHub Actions with SquashFS and FUSE
Read in a different language:
The translations are generated by an AI model and may not always be correct.

Near-Instant Dependency Restoration in GitHub Actions with SquashFS and FUSE

If you've ever watched a CI pipeline spend 30–60 seconds just extracting node_modules from a cache archive, you know the frustration. That's dead time on every single job, every single PR - and it adds up fast when your team runs hundreds of jobs per day.

In this post, I'll walk you through the solution I designed and implemented to cut dependency restoration from ~60 seconds down to ~3 seconds on self-hosted GitHub Actions runners. The core idea: stop extracting archives and start mounting them as read-only filesystem images, using SquashFS, FUSE, and a shared EFS volume on AWS EKS.

The problem

Our monorepo has around 772,000 files and 91,000 directories inside node_modules alone. The traditional caching approach - compress everything into a tar archive, store it somewhere shared, copy it back, extract it - had become a serious bottleneck.

Profiling the extraction step revealed that the time wasn't spent on decompression. It was spent on I/O and kernel overhead: creating hundreds of thousands of files and directories, setting permissions, writing metadata. Even on fast NVMe storage, the kernel's filesystem operations (mkdir, create, chmod) for 772K files consumed ~29 seconds of sys time that no amount of faster disk could eliminate.

This realization is what led us away from extraction entirely. If the bottleneck is creating files, then the answer is to not create them at all - mount a pre-built filesystem image and read directly from it.

The journey to the current solution

Getting here wasn't straightforward. The solution went through six iterations, and I think the failed attempts are just as instructive as the final result.

Attempt 1: tar + lz4 on EFS

The original approach. Dependencies were archived with tar -I lz4 and stored on an EFS volume. On restore, the archive was copied to local disk and extracted. Extraction took ~60 seconds. Profiling pointed to I/O as the bottleneck - writing all those files to EBS storage.

Lesson: Even with lz4 (one of the fastest compression algorithms), the massive volume of files got us I/O-bound. We needed faster disk.

Attempt 2: NVMe instance store

We tried m5d.8xlarge instances with 2x600 GB NVMe SSDs in RAID-0 (~1.1 TB) to eliminate the storage bottleneck. Extraction dropped from ~60s to ~30s - a meaningful improvement, but the remaining time was now syscall-bound. The kernel was spending 29 seconds just on filesystem metadata operations. Faster storage couldn't help anymore.

Lesson: Even the fastest NVMe SSDs only halved the time. The remaining 29s was pure kernel syscall overhead - no disk in the world could fix that. We needed to stop creating files altogether.

Attempt 3: erofs + kernel mounts

We mounted erofs images via mount -t erofs -o loop with kernel overlayfs on top. This achieved the ~5s restore target, but introduced two critical problems. First, loop device leaks - kernel loop devices persisted on the host after pod termination, and when runner pods crashed or were evicted, stale loop devices accumulated (up to 18 per node), eventually exhausting the pool. Cleaning up required terminating all EC2 instances in the node group. Second, Bottlerocket's SELinux blocked CAP_SYS_ADMIN, requiring full privileged: true for kernel mount syscalls.

Lesson: Mounting worked - 5s restores. But kernel loop devices leaked on pod crashes, accumulating until the entire node group had to be recycled. We needed userspace mounts.

Attempt 4: erofs + FUSE

The logical next step was userspace mounting with erofsfuse, but Ubuntu 22.04's erofs-utils package (v1.4) doesn't include it - that binary was added in v1.5+. Building from source introduced too much complexity for a CI optimization.

Lesson: erofs had no packaged FUSE support on our platform. squashfs did - mature, widely available, zero build-from-source hassle.

Attempt 5: squashfs + FUSE with partial cache hits

squashfs + FUSE worked great. We built it out with three restore paths - exact hit, partial hit, and full miss - to cover every scenario. The partial hit path did incremental caching: match on a restore-key prefix, run npm install on top, diff and re-squash. It worked, but added significant complexity for marginal benefit.

Lesson: The core approach (squashfs + FUSE) was right, but had too much complexity that caused issues. We needed to simplify.

Attempt 6: squashfs + FUSE, exact-or-miss (current)

Simplified to just two paths: exact cache hit or full miss. No partial hits, no incremental saves. The simplicity is the feature - lockfile changes are infrequent enough that the occasional full npm ci on a miss is a perfectly acceptable trade-off for dropping all the incremental overlay machinery.

Lesson: Dropping partial hits and incremental saves cut the codebase in half with no measurable impact on CI times. Simplicity won.

How it works

The core concept is simple: instead of extracting 772K files from an archive, mount a compressed filesystem image and layer a writable overlay on top. Reads come from the cached image, writes go to ephemeral local disk. The workspace looks and behaves exactly as if you ran npm ci - tools like Nx, TypeScript, and Vitest see a normal node_modules directory.

Only the cached directories (node_modules, .venv) are overlaid. Everything else - .git, source files, config files - stays on the native filesystem untouched.

Cache hit: mount and go (~3s)

When the cache key matches, the restore action:

  1. Copies the .squashfs image from EFS to local disk
  2. Mounts it read-only via squashfuse (FUSE - no kernel loop devices)
  3. Layers a writable overlay via fuse-overlayfs at each dependency directory's workspace path

That's it. The workspace is ready for builds and tests.

Cache miss: install and save

When there's no matching cache, the action falls back to a fresh npm ci. After a successful install, it saves a new squashfs image:

  1. Finds all node_modules directories in the workspace
  2. Creates a staging directory with sudo mount --bind for each directory, preserving workspace-relative paths (passing multiple source directories to mksquashfs directly would cause it to flatten and rename duplicates like node_modules_1, node_modules_2)
  3. Compresses with mksquashfs using lz4 and all available processors (~12s for 764K files)
  4. Uploads atomically to EFS with a run-unique .tmp.$GITHUB_RUN_ID suffix, then renames to prevent collisions from concurrent runners

Architecture

Here's what the storage layout looks like on a runner pod:

$GITHUB_WORKSPACE (native filesystem - unchanged)
├── .git/                          ← native (no FUSE)
├── src/, apps/, packages/         ← native (no FUSE)
├── node_modules/                  ← fuse-overlayfs mount
│   lower: squashfs image (cached, read-only)
│   upper: /local-cache/upper/... (ephemeral writes)
├── apps/app/node_modules/         ← fuse-overlayfs mount
│   lower: squashfs image (cached, read-only)
│   upper: /local-cache/upper/... (ephemeral writes)
└── packages/shared/node_modules/  ← fuse-overlayfs mount
    ...

The backing storage has two layers:

  • /local-cache - an emptyDir volume on the pod. Holds the local .squashfs copy, squashfuse mount points, and overlay upper/work directories. Needs at least 10 GB capacity.
  • EFS (/cache) - a shared ReadWriteMany PVC accessible from all runners. Stores the canonical squashfs images with 7-day retention.

Cache key strategy

Cache keys are computed from lockfile content only - no branch name. This means identical dependencies share the same cache across all branches.

For Node.js, the key format is node-<NODE_VERSION>-<HASH>, where the hash is computed from all package-lock.json files with .version and .packages[].version fields stripped out. This makes the cache resilient to CD version bumps that only change workspace package versions without affecting actual dependencies. A real dependency add, remove, or upgrade still changes the hash and triggers a miss.

For Python, the key is python-venvs-<HASH> computed from all poetry.lock files.

Why FUSE instead of kernel mounts

You might wonder why we use FUSE (squashfuse + fuse-overlayfs) rather than kernel mounts (mount -t squashfs + mount -t overlay), given that kernel mounts are generally faster. The answer comes down to operational safety in Kubernetes. FUSE mounts are process-owned - when a pod terminates (even on a crash), they're cleaned up automatically. Kernel loop devices, on the other hand, persist on the host. As we discovered with erofs (Attempt 3), crashed or evicted pods left stale loop devices that accumulated across nodes, eventually exhausting the pool and requiring full node group recycling. FUSE also avoids the kernel loop device pool entirely - squashfuse reads the image file directly. The privilege requirements are simpler too: FUSE only needs /dev/fuse, not full kernel mount capabilities.

Making FUSE work with GitHub Actions

This was the trickiest part. FUSE mounts behave differently from regular mounts, and the GitHub Actions runner has its own opinions about background processes. There are three specific quirks you need to work around.

1. Orphan process cleanup

The GitHub Actions runner kills background processes between workflow steps. It tracks them using the RUNNER_TRACKING_ID environment variable. Since squashfuse and fuse-overlayfs are daemon processes that run in the background, the runner will kill them between steps - unmounting your dependencies mid-workflow.

The fix: clear RUNNER_TRACKING_ID before launching the FUSE daemons.

RUNNER_TRACKING_ID="" squashfuse -o allow_other /local-cache/image.squashfs /local-cache/mnt

This prevents the runner from tracking these processes, so they survive across steps.

2. Process-scoped FUSE mounts

By default, FUSE mounts are only visible to the process that created them. Each GitHub Actions step runs in a separate shell process, so subsequent steps wouldn't see the mounts.

The fix: use the allow_other mount option, and make sure /etc/fuse.conf contains user_allow_other.

# In the runner image Dockerfile or setup
echo "user_allow_other" >> /etc/fuse.conf

# When mounting
RUNNER_TRACKING_ID="" squashfuse -o allow_other image.squashfs /mountpoint
RUNNER_TRACKING_ID="" fuse-overlayfs \
  -o lowerdir=/local-cache/mnt/path,upperdir=/local-cache/upper/path,workdir=/local-cache/work/path \
  -o allow_other \
  $GITHUB_WORKSPACE/node_modules

3. Docker-in-Docker visibility

If your runners use a Docker-in-Docker sidecar (common for running service containers like Postgres), FUSE mounts created in the runner container are invisible to the dind container. This sounds like it would break Docker bind mounts (e.g., mounting init scripts into a Postgres container), but in practice it's not a problem - Docker bind mounts resolve through the shared work emptyDir filesystem, not through FUSE mount points. We originally added mount propagation (Bidirectional/HostToContainer) to solve this, but removed it after confirming all integration tests pass without it. No special configuration needed.

Infrastructure requirements

To implement this on your own EKS cluster, you need:

Runner pod configuration:

  • /dev/fuse access - required for squashfuse and fuse-overlayfs. The simplest approach is privileged: true on the runner container, but this can be narrowed to just /dev/fuse device access.
  • /local-cache emptyDir volume - local scratch space for squashfs copies, mount points, and overlay layers. At least 10 GB.
  • /cache PVC mount - EFS with ReadWriteMany access mode, shared across all runners.

Installed tools:

  • squashfuse - FUSE-based squashfs mounter
  • fuse-overlayfs - FUSE-based overlay filesystem
  • squashfs-tools - provides mksquashfs for creating images

EFS configuration:

  • Accessible from all runner nodes
  • Sufficient throughput for concurrent reads (6+ jobs reading simultaneously is common)
  • The copy step retries up to 3 times with 2-second backoff to handle NFS stale file handles under concurrent access

Cache lifecycle and cleanup

Cache images on EFS are cleaned up automatically. A daily scheduled workflow runs at 03:00 UTC and deletes any file not accessed in the last 7 days. The restore action touches the EFS source file after every successful mount, so mtime reflects last successful use rather than creation time. Corrupt images that fail to mount are never touched and expire naturally.

For manual intervention, the same workflow accepts a custom age_days parameter (default: 7, 0 = delete all) and a dry_run flag to preview what would be deleted.

No per-branch cleanup on PR close is needed - since cache keys are lockfile-based and not branch-based, orphaned caches (from branches with unique lockfile hashes that have been merged or closed) expire through the daily purge.

File layout

For reference, here's the full file layout:

.github/
├── actions/
│   ├── local-cache-restore/action.yml   # Copy squashfs from EFS + mount via squashfuse
│   ├── local-cache-save/action.yml      # Bind mount staging + mksquashfs + upload to EFS
│   └── setup-workspace/action.yml       # Orchestrate overlays + dependency install
└── workflows/
    └── purge-cache.yml                  # Daily + manual cache cleanup

EFS (/cache):
├── node_modules_cache/
│   └── node-24.11.1-<hash>.squashfs    # ~1.6 GB compressed
└── python_venv_cache/
    └── python-venvs-<hash>.squashfs    # ~1.2 GB compressed

Runner local (/local-cache):
├── node_modules_cache.squashfs          # Local copy from EFS
├── mnt-node_modules_cache/              # squashfuse mount point (read-only)
├── upper/                               # Per-directory overlay upper layers
│   ├── node_modules_cache/
│   │   ├── node_modules/
│   │   ├── apps/app/node_modules/
│   │   └── ...
│   └── python_venv_cache/
│       └── services/api/.venv/
├── work/                                # Overlay work directories
└── stage-*/                             # Temporary bind mount staging (during save)

Troubleshooting

If you decide to implement this, here are the issues you're most likely to run into.

"Failed to mount squashfs via FUSE" - Check that /dev/fuse exists in the container with ls -la /dev/fuse. If it's missing, the runner pod may not have privileged: true or a /dev/fuse device volume. Also verify that squashfuse is installed.

"Failed to copy cache file" / "Stale file handle" - An EFS NFS handle expired under concurrent access. The retry mechanism (3 attempts, 2s backoff) usually resolves this. If it's persistent, check your EFS throughput limits or burst credit balance.

"Module not found" errors after cache hit - The squashfs image may have stale content. Purge the cache and let it rebuild on the next run.

"Could not find Nx modules" - The FUSE mount isn't visible to subsequent steps. Verify that RUNNER_TRACKING_ID="" is set before launching the FUSE daemons, allow_other is in the mount options, and user_allow_other is in /etc/fuse.conf.

Cache miss on every run - Check the cache key computation. If your lockfile changes on every run (for example, Verdaccio URL rewrites not being reverted), the hash will never match. Make sure to restore the original lockfile content after install.

Docker bind mounts fail - Docker bind mounts resolve through the shared work emptyDir, not through FUSE mounts. If Docker containers can't see files that should be there, check that the work volume is correctly shared between the runner and dind containers in the Helm values.

Results

On cache hit, dependency restoration now takes ~3 seconds - down from 60 seconds with tar extraction, or ~30 seconds even on NVMe. For a team running hundreds of CI jobs daily, that's a significant amount of developer wait time eliminated - and the solution is completely transparent to the workflows that consume it.

The simplification to exact-or-miss (dropping partial hits) was the right call. The added complexity of incremental saves bought very little in practice, since lockfile changes are relatively infrequent and a full cache miss only adds the cost of one npm ci run.

If your self-hosted runners are spending tens of seconds extracting cached dependencies, consider whether mounting might be a better fit. The upfront investment in setting up FUSE and EFS pays for itself quickly at scale.

Share this article

Other Articles

Cover Image for The Effects of AI Development Assistants

The Effects of AI Development Assistants

AI-powered tools like GitHub Copilot, Cursor, and Windsurf significantly boost developer productivity, potentially turning average developers into 10x developers. While AI can handle basic tasks, developers with domain knowledge are crucial for breaking down complex problems and guiding AI to optimal solutions.

Cover Image for Effortless Offloading: Next.js Meets WebWorkers

Effortless Offloading: Next.js Meets WebWorkers

In this post, we build an image preview page, discover its limitations, and then offload the work to WebWorkers to achieve high performance by using the next-webworker-pool NPM package.

Cover Image for Level up your React codebase quality

Level up your React codebase quality

In this blog post, we will explore strategies and best practices for improving code quality in React applications. Whether you're a seasoned React developer or just getting started, you'll find valuable insights to elevate your codebase and streamline your development process.

Cover Image for Effective communication for software engineers

Effective communication for software engineers

As a software engineer, discussing technical subjects with colleagues from other departments needs to happen as flawlessly as possible. In this post, I describe my approach to maximize the value of any meeting.

Cover Image for Using i18n programmatically in Angular

Using i18n programmatically in Angular

This blog post describes my approach to using i18n in Angular components, in a programmatical way. The example usecase in this blog post is displaying API error messages in an user-friendly way, but the approach is generic to any usecase.