sudori part 5
This is a blog post on sbt 2.x development, continuing from part 3, sbt 2.x remote cache, sbt 2.x remote cache with Bazel compatibility, sudori part 4 etc. I work on sbt 2.x in my own time with collaboration with the Scala Center and other volunteers, like Billy at EngFlow. These posts are extended PR descriptions to share the features that will come to a future version of sbt.
introduction
I was on a late summer vacation in a beach town in Connecticut this week — checking out local skate spots, dipping into cool Long Island Sound, and comparing lobster rolls. In between skateboarding and beach going, I worked on remote test caching in sbt, and a few experiments related to the idea. I’d like to share some of my findings here, mostly as a development write-up.
In sudori part 4 we looked into remote caching of compile
task. While caching compile
is useful, most of the CI (continuous integration) systems spend their time running tests, not just compiling code. Bazel achieves orders-of-magnitude faster CI in part due the default test
command being remote cached. In other words, using Bazel, if a test runs once on a CI machine, the test result will be cached until its input changes.
An advanced sbt user might note that sbt already includes the testQuick
task for local incremental testing. The issue with testQuick
is that its invalidation relies on timestamps, which are non-hermetic, and thus not reproducible across machines. In this post, we’ll discuss test caching for sbt 2.x that can be shared safely across the machines. The corresponding pull request is sbt/sbt#7644.
the anatomy of sbt 1.x testQuick
As a reference, it would be useful to understand how testQuick
works in sbt 1.x in a bit more details. In sbt, there’s a mechanism called TestsListener
where a build user can register a lister to handle test events, like test completing. sbt 1.x implements a lister by default called TestStatusReporter:
private[sbt] class TestStatusReporter(f: File) extends TestsListener {
private lazy val succeeded: concurrent.Map[String, Long] = TestStatus.read(f)
def startGroup(name: String): Unit = { succeeded.remove(name); () }
def endGroup(name: String, result: TestResult): Unit = {
if (result == TestResult.Passed)
succeeded(name) = System.currentTimeMillis
}
def doComplete(finalResult: TestResult): Unit = {
TestStatus.write(succeeded, "Successful Tests", f)
}
}
In short, every time a test passes, the current timestamp is recorded. At the end of test
, TestStatus.write(...)
writes a Java properties file named succeeded_tests
in a known location under target/
.
testQuick
filters down the defined test suite candidates using testQuickFilter, which calculates the most-recent timestamp of the transitive dependencies of the test implementation:
val stamps = collection.mutable.Map.empty[String, Long]
def stamp(dep: String): Long = {
val stamps = for (a <- ans) yield intlStamp(dep, a, Set.empty)
if (stamps.isEmpty) Long.MinValue
else stamps.max
}
def intlStamp(c: String, analysis: Analysis, s: Set[String]): Long =
....
def noSuccessYet(test: String) = succeeded.get(test) match {
case None => true
case Some(ts) => stamps.synchronized(stamp(test)) > ts
}
By transitive dependencies, I mean both class dependencies and library JARs.
what is an incremental tests?
Rewording Build Systems à la Carte (Mokhov et al, 2018) a bit, we can think of an incremental test to be: A test system is minimal if it executes test suites at most once per run and only if they transitively depend on inputs that changed since the previous run. We can define a test system to be incremental if it runs at least the minimal tests, and sometimes more.
At first it seems similar to the incremental compilation, but there’s a critical difference between compiling and testing. With Scala, we can often ignore the body of methods during compilation. Bazel goes as far to define ijar
which blanks out the implementation entirely for Java. So if class B
calls A#foo
in A
, incremental compilation often does not invalidate B
when the body of A#foo
changes. For testing, changes in the method body matter.
Let’s use some examples. Assume we have the following classes:
// Animal.scala
package example
trait Animal:
def walk: Int = 0
end Animal
// Cow.scala
package example
class Cow extends Animal:
def moo: Int = 0
end Cow
// Mineral.scala
package example
trait Mineral:
def ok: Boolean = true
end Mineral
And the following JUnit 4 test:
package example
import org.junit.Test
import org.junit.Assert.assertEquals
class CowTest:
@Test
def testMoo: Unit =
val cow = Cow()
assertEquals(cow.moo, 0)
end CowTest
Based on the minimality:
- Changing
CowTest
itself should invalidate the previous test result - Changing
Cow
should invalidate the previous test result - Changing
Animal
should invalidate the previous test result - Changing
Mineral
should not invalidate the previous test result
A remote cached test would perform the above operations across multiple machines.
step 1: hermetic incremental testing
Before we get into remote caching, we need to fix the reliance on timestamp, which is non-hermetic. Bazel forms a dependency graph between subprojects, or targets, and the caching and invalidation happens at the target boundary. That doesn’t quite work for sbt where one subproject contains many test suites.
We can, however, create a sub-JAR graph at the class granularity, which is what Zinc Analysis tracks for incremental compilation. The way I think about it is that incremental compilation forms a monoid in the category of *.class
where the arrow goes out following the method calls only when the signature changes. Whereas, incremental testing forms a monoid in the category of *.class
where the arrow goes out of the method calls anytime the bytecode changes.
- | object | arrow |
---|---|---|
bazel test | JAR (target) | JAR dependency |
sbt compile | *.class |
method call / API |
sbt test | *.class |
method call / bytecode |
sbt already has a task called definedTests
to automatically detect test suites in a subproject. I’m introducing a new task called definedTestDigests
, which is typed to Map[String, Digest]
. Digest
represents a combo of a cryptographic hash and the file size, but here, we are using it as a Merkle tree of digests.
class ClassStamper(
classpath: Seq[Attributed[HashedVirtualFileRef]],
converter: FileConverter,
):
/**
* Given a classpath and a class name, this tries to create a SHA-256 digest.
* @param className className to stamp
* @param extraHashes additional information to include into the returning digest
*/
private[sbt] def transitiveStamp(
className: String, extaHashes: Seq[Digest]): Option[Digest] =
val digests = SortedSet(analyses.flatMap(internalStamp(className, _, Set.empty)): _*)
if digests.nonEmpty then Some(Digest.sha256Hash(digests.toSeq ++ extaHashes: _*))
else None
private def internalStamp(
className: String,
analysis: Analysis,
alreadySeen: Set[String],
): SortedSet[Digest] =
....
end ClassStamper
The internalStamp(...)
method follows similar logic as testQuick
, except it will calculate SHA-256 of the bytecode instead of the compilation timestamp. We can trigger definedTestDigests
task immediately after compile
to pre-calculate the Merkle tree of SHA-256 hashes for all discovered test suites.
// cache the test digests against the fullClasspath.
def definedTestDigestTask: Initialize[Task[Map[String, Digest]]] = Def.cachedTask {
val cp = (Keys.test / fullClasspath).value
val testNames = Keys.definedTests.value.map(_.name).toVector.distinct
val converter = fileConverter.value
val sv = Keys.scalaVersion.value
val inputs = (Keys.compile / Keys.compileInputs).value
// by default this captures JVM version
val extraInc = Keys.extraIncOptions.value
// throw in any information useful for runtime invalidation
val salt = s"""$sv
${converter.toVirtualFile(inputs.options.classesDirectory)}
${extraInc.mkString(",")}
"""
val extra = Vector(Digest.sha256Hash(salt.getBytes("UTF-8")))
val stamper = ClassStamper(cp, converter)
Map((testNames.flatMap: name =>
stamper.transitiveStamp(name, extra) match
case Some(ts) => Seq(name -> ts)
case None => Nil
): _*)
}
Given that a class can be cross built across different JVM versions, Scala versions, and JVM vs JS vs Native, we capture additional information as salt
, which are hashed together. This Digest
can be used in succeeded_tests
replacing the timestamp. Given a test suite like CowTest
, if the file contains the same Digest
as the current Digest
, that means the test can be skipped.
step 2: caching the test results
The nice thing about sbt 2.x’s caching system is that the interface for the local disk cache and the remote cache is unified. So we have to get it into the sbt 2.x caching system once and that should implement both disk cache and remote cache.
Keeping with the Bazel tradition, a cacheable unit would be called an action cache. Since I’m going to only cache the successful tests, we can represent them with an integer value 0
. The new TestStatusReporter
can do this after each test suite succeeds:
private[sbt] class TestStatusReporter(
digests: Map[String, Digest],
cacheConfiguration: BuildWideCacheConfiguration,
) extends TestsListener:
// int value to represent success
private final val successfulTest = 0
/**
* If the test has succeeded, record the fact that it has
* using its unique digest, so we can skip the test later.
*/
def endGroup(name: String, result: TestResult): Unit =
if result == TestResult.Passed then
digests.get(name) match
case Some(ts) =>
// treat each test suite as a successful action that returns 0
ActionCache.cache(
key = (),
codeContentHash = ts,
extraHash = Digest.zero,
tags = CacheLevelTag.all.toList,
config = cacheConfiguration,
): (_) =>
ActionCache.actionResult(successfulTest)
case None => ()
else ()
end TestStatusReporter
Next on the testQuick
filter, we can check if the action cache exists for the Merkle tree:
def hasSucceeded(className: String): Boolean = digests.get(className) match
case None => false
case Some(ts) => hasCachedSuccess(ts)
def hasCachedSuccess(ts: Digest): Boolean =
val input = cacheInput(ts)
ActionCache.exists(input._1, input._2, input._3, config)
def cacheInput(value: Digest): (Unit, Digest, Digest) =
((), value, Digest.zero)
If the action cache exists, we know it’s 0
, so we don’t bother grabbing the value.
demo 1: passing information
See sbt 2.x remote cache with Bazel compatibility on how to configure remote cache with different backends. Let’s try testing using CowTest
.
package example
import org.junit.Test
import org.junit.Assert.assertEquals
class CowTest:
@Test
def testMoo: Unit =
val cow = Cow()
assertEquals(cow.moo, 0)
end CowTest
directory 1:
First the test works normally:
$ sbt
[info] welcome to sbt 2.0.0-alpha11-SNAPSHOT (Azul Systems, Inc. Java 1.8.0_402)
....
sbt:inctest> testQuick
[info] Updating inctest_3
[info] Resolved inctest_3 dependencies
....
[info] Passed: Total 1, Failed 0, Errors 0, Passed 1
[success] elapsed time: 3 s, cache 0%, 6 onsite tasks
directory 2:
Next, copy the entire directory to another directory, wipe out the disk cache, and rerun the test:
$ rmtrash $HOME/Library/Caches/sbt/v2/ && rmtrash target && rmtrash project/target
$ sbt
[info] welcome to sbt 2.0.0-alpha11-SNAPSHOT (Azul Systems, Inc. Java 1.8.0_402)
sbt:inctest> testQuick
....
[info] Passed: Total 0, Failed 0, Errors 0, Passed 0
[info] No tests to run for Test / testQuick
[success] elapsed time: 2 s, cache 20%, 1 remote cache hit, 4 onsite tasks
Although it was within the same laptop, this shows that the test result from one directory carries over to another via the Bazel-compatible remote cache.
demo 2: invalidations
Let’s test the minimality criteria. First, change the test itself in some way to make sure it fails:
package example
import org.junit.Test
import org.junit.Assert.assertEquals
class CowTest:
@Test
def testMoo: Unit =
val cow = Cow()
assertEquals(cow.moo, 1)
end CowTest
sbt:inctest> testQuick
[error] Test example.CowTest.testMoo failed: java.lang.AssertionError: expected:<0> but was:<1>, took 0.002 sec
[error] at example.CowTest.testMoo(CowTest.scala:10)
[error] ...
[error] Failed: Total 1, Failed 1, Errors 0, Passed 0
[error] Failed tests:
[error] example.CowTest
[error] (Test / testQuick) sbt.TestsFailedException: Tests unsuccessful
[error] elapsed time: 0 s, cache 100%, 5 remote cache hits
This failed as expected. Next change Cow#moo
to return 1
:
package example
class Cow extends Animal:
def moo: Int = 1
end Cow
Now it passes as expected. Next change the implementation of Animal
trait:
package example
trait Animal:
def walk: Int = 0
def swim: Int = 0
end Animal
sbt:inctest> testQuick
[info] compiling 1 Scala source to target/out/jvm/scala-3.4.2/inctest/backend ...
[info] compiling 1 Scala source to target/out/jvm/scala-3.4.2/inctest/backend ...
[info] Passed: Total 1, Failed 0, Errors 0, Passed 1
[success] elapsed time: 1 s, cache 16%, 1 remote cache hit, 5 onsite tasks
This reran the test as expected. Finally, let’s change the implementation of Mineral
, which should not shake the CowTest
:
package example
trait Mineral:
def ok: Boolean = false
end Mineral
sbt:inctest> testQuick
[info] compiling 1 Scala source to target/out/jvm/scala-3.4.2/inctest/backend ...
[info] Passed: Total 0, Failed 0, Errors 0, Passed 0
[info] No tests to run for Test / testQuick
[success] elapsed time: 0 s, cache 20%, 1 remote cache hit, 4 onsite tasks
This worked as expected, since it compiled, but did not invalidate the CowTest
result. There might be a lot of small details we can improve, but hopefully this demonstrates the idea of remote caching of the tests.
summary
In sbt/sbt#7644 I’ve implemented remote test caching for sbt 2.x that stores test success results across machine boundaries. testQuick
, which can replace test
only runs the tests that have been invalidated by input changes. Unlike sbt 1.x, the PR uses Merkle tree to perform the invalidation, which makes it hermetic to share across the machines. Caching the test result can speedup the CI jobs by orders of magnitude.
Appdendix A: what is sudori?
Sweet and sour pork, often made with bell peppers and pineapple in Asia, is called 酢豚 (subuta) in Japan, or vinegar pork, and is one of bacronyms for sbt. 酢鶏 (sudori), or vinegar chicken, is a variant of subuta substituting pork with chicken, and it’s a codename I’ve been using to discuss sbt 2.x. The word ketchup is said to derive from Hokkien word 膎汁 (kôe-chiap or kê-chiap) from southern coastal China, meaning fish sauce, which re-entered China from Vietnam in 1700s. Through trade, fish sauce also became popular in Britain, where it eventually became a mushroom paste. In 1800s, Americans started making it with tomatoes. In a sense, it’s interesting how Cantonese dish like 咕嚕肉 (gūlōuyuhk), stomach-growling pork incorporates the American ketchup, making a full circle as sweet and sour pork.
Appendix B: why did you use JUnit 4?
It turns out that some test frameworks use non-hermetic macros. For example, when I changed the directory munit produced different bytecode:
--- a/example/CowTest.class.asm
+++ b/example/CowTest.class.asm
@@ -56,7 +56,7 @@
]
NEW munit/Location
DUP
- LDC "/Users/xxx/inctest/src/test/scala/example/CowTest.scala"
+ LDC "/Users/xxx/inctest2/src/test/scala/example/CowTest.scala"
BIPUSH 6
INVOKESPECIAL munit/Location.<init> (Ljava/lang/String;I)V
INVOKEVIRTUAL example/CowTest.assert (Lscala/Function0;Lscala/Function0;Lmunit/Location;)V
Update: I sent scalameta/munit#823 to fix this.