Techouse Developers Blog

テックハウス開発者ブログ|マルチプロダクト型スタートアップ|エンジニアによる技術情報を発信|SaaS、求人プラットフォーム、DX推進

RubyKaigi 2026 - When Can You Skip a Test? Tracking Test Impact (Day1)

ogp

Introduction

Hi, I'm Puput, a web and app engineer at Techouse. As an engineer, waiting for CI is just part of my daily rhythm, but let me ask: how do you spend that time?

There is plenty you could be doing, sure, but what about the moments you are rushing through a small fix, or all you wanted was to add a single line of comment? Even then, you have to sit through the full CI run before you can merge. That kind of waiting can get genuinely frustrating.

I don't think this is about poorly written tests, or a lack of quality awareness. The cause is baked into the default itself: every test, every time. As a codebase grows, the test suite grows with it, and at some point "just run everything" stops being a reasonable answer.

This was exactly the problem Andrey Marchenko tackled at RubyKaigi 2026, in a talk called When Can You Skip a Test? Tracking Test Impact.

In this post, I'll share what stuck with me from his session on test impact analysis.

The Idea: Only Run the Tests Your Change Touches

The idea behind test impact analysis is not new.

The concept itself is simple. Record which source files each test touches. On each commit, only run the tests that touch the files that changed. Everything else can be safely skipped.

The hard part is the implementation. If the "record what each test touched" step is itself expensive, you end up paying the same cost as running every test, or worse. Designing a low-overhead recording mechanism is what makes or breaks the whole approach.

Why a Native C Extension?

The Datadog team did not jump straight to a native extension. They started with the most accessible Ruby APIs, and the session walked through the actual numbers from those experiments:

  • A prototype using the built-in Coverage module (Ruby's standard line-coverage tracker, used by tools like SimpleCov): around 300% overhead
  • An approach using TracePoint, the Ruby standard library hook for observing runtime events (method calls, line execution, raises, and so on): 200–400% overhead

Numbers like that are nowhere near runnable on every commit. The whole journey is also captured on Datadog's engineering blog, which Andrey shared on screen during the talk.

www.datadoghq.com

So the team went lower. The final implementation is a native C extension that hooks directly into RUBY_EVENT_LINE, the VM's per-line execution trigger. On top of that, smaller optimizations were stacked. For example, instead of comparing filenames character-by-character, the extension stashes each filename's memory address and compares addresses, a single integer comparison.

The result was that the median overhead came down to around 25%. That is well within tolerance for running on every commit.

From here on, I'll give you a general idea of how this is achieved. Feel free to check out the Datadog blog above for the full story.

Handling Rails' "Code-less" Classes

Hooking into RUBY_EVENT_LINE has a caveat. For a certain category of classes, line-level events alone do not capture the dependency between a test and the file.

Consider a typical Rails model:

class Account < ApplicationRecord
  belongs_to :user
end

Account inherits all of its real behavior from ApplicationRecord (and ultimately ActiveRecord::Base). The body of account.rb is run once, when the class is first loaded.

So when a test calls Account.new, RUBY_EVENT_LINE events fire for the parent files but not for account.rb itself. The dependency between the test and account.rb is real, but invisible to line-based tracking.

The consequence is exactly the failure mode a selective test runner cannot afford. If you rename :user to :owner in account.rb, the tests that exercise Account would catch the regression, but line coverage has recorded no dependency between those tests and account.rb, so the analyzer would skip them on this commit. The tests that should have run are exactly the ones that get left out.

In the talk these were called "code-less classes." The pattern is not unique to Rails, but Rails leans on it heavily (models, mailers, jobs, concerns), so the limitation hits hardest there.

The fix is to layer in a second event, RUBY_INTERNAL_EVENT_NEWOBJ, which fires every time any object is created. Combined with Ruby's Module.const_source_location, the system can take any allocated Account instance and trace it back to account.rb. The dependency is recorded, even though no line in account.rb ever ran during the test.

The result is a multi-axis view: line coverage catches the imperative code paths, allocation tracking catches the declarative ones.

Keeping the Impact Map Fresh

When I first heard about a selective test-execution mechanism, my first thought was the staleness question: what if the impact map gets out of date, and we end up skipping tests that should have run?

Andrey's answer was that the impact map is updated on every test run. The low observation overhead from the previous section is exactly what makes this practical. There is no need to regenerate on a schedule or keep stale snapshots around.

CI-Independent, but Not Standalone as a Gem

I had the chance to speak with Andrey directly after the session, and the rest of this post draws on what he shared in that conversation.

First, the collector, the part that gathers coverage, is CI-agnostic. GitHub Actions, CircleCI, Buildkite, a self-hosted setup: none of them require special integration.

The next part is the one I was most curious about. The OSS datadog-ci-rb gem is not self-contained. The "given this diff, which tests can we skip?" decision is designed to happen on Datadog's backend. In other words, dropping the gem in by itself does not give you selective test execution out of the box.

That said, he told me, this is not the end of the road.

Forking the gem and adapting it for your own project is entirely possible. The technically hardest and most valuable part, the coverage collection engine, is already contained in the gem. What is left is to implement the layer that decides where to store the collected data, and how to derive a "tests to run" set from a diff.

He even said that he is available for questions if we ran into trouble applying the gem to our own project. How nice is that? Kudos to Datadog too, for keeping the coverage collection engine open source.

What You Gain Even Without Forking

Even if you never fork the gem or build your own implementation, I think there is real value in this session as a deep dive into Ruby internals. Specifically:

  • How rb_profile_frames, a low-level Ruby C API that lets you read the current call stack without going through Ruby's regular profiling machinery, is put to use for tracking the current source file efficiently
  • How allocation profiling can fill the gaps that line coverage misses
  • What it takes to build an observation tool that is "always-on and still tolerable"

These are topics hard to reach without going into the Ruby VM layer. Test impact analysis seems poised to become standard in large Ruby codebases within the next few years — well worth understanding now.

Closing Thoughts

The most interesting part of this session, to me, was how Ruby VM event hooks were used to push observation overhead down to a level that is actually usable in practice, and how line coverage and allocation tracking together cover the kinds of dependencies a Rails test suite cares about.

The session itself stands on its own as an in-depth look at Ruby internals.

Finally, a sincere thank-you to Andrey Marchenko for generously sharing his time and his insights with me.

With Andrey Marchenko after the session


Techouseでは、社会課題の解決に一緒に取り組むエンジニアを募集しております。 ご応募お待ちしております。

jp.techouse.com