sudori part 2
I’m hacking on a small project called sudori, an experimental sbt. The initial goal is to port the macro to Scala 3. It’s an exercise to take the macro apart and see if we can build it from the ground up. This an advanced area of Scala 2 and 3, and I’m finding my way around by trial and error. This is part 2.
Reference:
Instance
When we think of the build.sbt
macro, the first thing that comes to our mind is the Applicative do macro that it implements using .value
even though some may not use those terms exactly. The main driver for this imperative-to-functional is in the companion object for an oddly named Instance class:
/**
* The separate hierarchy from Applicative/Monad is for two reasons.
*
* 1. The type constructor is represented as an abstract type because a TypeTag cannot represent a type constructor directly.
* 2. The applicative interface is uncurried.
*/
trait Instance {
type M[x]
def app[K[L[x]], Z](in: K[M], f: K[Id] => Z)(implicit a: AList[K]): M[Z]
def map[S, T](in: M[S], f: S => T): M[T]
def pure[T](t: () => T): M[T]
}
trait MonadInstance extends Instance {
def flatten[T](in: M[M[T]]): M[T]
}
As noted in the Scaladoc, sbt also internally has its own Applicative[_]
typeclass. At this point, we’re not sure if the two reasons listed by Mark in 2012 (circa Scala 2.10.0-M6) are still relevant. The macro looks like this:
def contImpl[T, N[_]](
c: blackbox.Context,
i: Instance with Singleton,
convert: Convert,
builder: TupleBuilder,
linter: LinterDSL
)(
t: Either[c.Expr[T], c.Expr[i.M[T]]],
inner: Transform[c.type, N]
)(
implicit tt: c.WeakTypeTag[T],
nt: c.WeakTypeTag[N[T]],
it: c.TypeTag[i.type]
): c.Expr[i.M[N[T]]] = ....
To decode this signature, it might help to understand the internals of sbt.
settings and tasks
I mentioned this in Selective post, but two hallmarks of functional programming are that it uses immutable data structure instead of mutation, and that it gives attention to when and how effects are handled.
From this perspective, we can think of setting expressions and tasks to be those two things:
- Settings form an immutable graph in a build.
- Tasks represent effects.
Anonymous settings are represented using Initialize[A]
, which looks like this:
sealed trait Initialize[A] {
def dependencies: Seq[ScopedKey[_]]
def evaluate(build: BuildStructure): A // approx
....
}
sbt.Task
is can be seen as a wrapper around side effect function () => A
. However when we say “compile is a task.” The task in this context is represented using Initialize[Task[A]]
. They are settings of type Task[A]
. In other words, we have a nested data type of execution plan and effects. We can rewrite the macro signature as follows:
def contImpl[A, Effect[_]](
c: blackbox.Context,
i: Instance & scala.Singleton,
convert: Convert,
builder: TupleBuilder,
linter: LinterDSL
)(
t: Either[c.Expr[A], c.Expr[i.F[A]],
inner: Transform[c.type, Effect]
): c.Expr[i.F[Effect[A]]] = ....
So what’s t
? Why does it take an either? The answer is that it does Applicative-do when you pass Left[c.Expr[A]]
and Monadic-do when you pass Right[c.Expr[i.F[A]]]
. So it’s a weird implementation detail to share this code. If you remember from part 1, convert
here is a glorified partial function, which then can be used to search and replace some portions of a given abstract syntax tree.
pure
Consider the following setting expression:
someKey := { 1 }
Assuming someKey
is a setting, we need to construct Initialize[Int]
. contImpl
macro does this generically given an Applicative instance for Initialize
. Specifically in this case by calling i.pure(...)
.
// no inputs, so construct F[A] via Instance.pure or pure+flatten
def pure(body: Term): Expr[i.F[Effect[A]]] =
def pure0[A1: Type](body: Expr[A1]): Expr[i.F[A1]] =
'{
$i
.pure[A1] { () => $body }
.asInstanceOf[i.F[A1]]
}
eitherTree match
case Left(_) => pure0[Effect[A]](body.asExprOf[Effect[A]])
case Right(_) =>
flatten(pure0[i.F[Effect[A]]](body.asExprOf[i.F[Effect[A]]]))
This looks to be a straightforward code to call pure
function on i
. However, this will fail as:
[error] -- [E007] Type Mismatch Error: sudori/core-macros/src/main/scala-3/sbt/internal/util/appmacro/Cont.scala:119:13
[error] 119 | $i
[error] | ^
[error] | Found: (i : sbt.internal.util.appmacro.MonadInstance & Singleton)
[error] | Required: quoted.Expr[Any]
[error] one error found
Getting instance out of i.type
We need to splice i
into the quoted code. One way would be to inline i
, but then we won’t be able to use i
like i.F
so that won’t work. There’s a technique to summon i
here from a singleton type. A singleton type is a type with a single inhabitant written as p.type
where p
points to a term extending AnyRef
.
Let’s say we have i.type
. First thing we do is obtain Type[i.type]
by adding context bound A: Type
. Next we can get TypeRepr
for it as TypeRepr[A]
, which provides type information during macro time. We can’t just pattern match on TypeRepr[A]
however. There’s a new Scala 3 feature called TypeTest that can test types by using unapply
.
def extractSingleton[A: Type]: Expr[A] =
def termRef(r: TypeRepr)(using rtt: TypeTest[TypeRepr, TermRef]): Ref = r match
case rtt(ref) => Ref.term(ref)
case _ => sys.error(s"expected termRef but got $r")
termRef(TypeRepr.of[A]).asExprOf[A]
Back to pure
, this can be used like this:
// we can extract i out of i.type
val instance = extractSingleton[i.type]
// no inputs, so construct F[A] via Instance.pure or pure+flatten
def pure(body: Term): Expr[i.F[Effect[A]]] =
def pure0[A1: Type](body: Expr[A1]): Expr[i.F[A1]] =
'{
$instance
.pure[A1] { () => $body }
.asInstanceOf[i.F[A1]]
}
eitherTree match
case Left(_) => pure0[Effect[A]](body.asExprOf[Effect[A]])
case Right(_) =>
flatten(pure0[i.F[Effect[A]]](body.asExprOf[i.F[Effect[A]]]))
map
Consider the following setting expression:
someKey := { name.value + "!" }
This will expand to something like:
someKey <<= i.map(wrap(name), (q1: String) => { q1 + "!" })
lambda expression
The interesting part is generating the lambda expression (anonymous function) (q1) => { q1 + "!" }
. If we didn’t care about the symbol for the lambda expression, then there’s a shortcut function provided by Quote Reflection called Lambda(…):
def makeLambdaImpl(expr: Expr[Unit])(using qctx: Quotes) =
import qctx.reflect.*
Lambda(
owner = Symbol.spliceOwner,
tpe = MethodType(List("x"))(_ => List(TypeRepr.of[Boolean]), _ => TypeRepr.of[String]),
rhsFn = (sym, params0) => {
val param = params0.head
val toStr = Select.unique(Ref(param.symbol), "toString")
toStr.appliedToNone
}
).asExprOf[Boolean => String]
lambda expansion problem
This presents a problem because to create a lambda expression we’d have to know the list of parameters upfront listed in MethodType(...)
. In sbt 1.x macro, currently it uses transformWrappers(...)
and walk the tree to figure out whether we need to create a lambda expression, and if so how many parameters we need. The substitution function also needs to know the val
, which requires a function symbol to the lambda expression.
sbt 1.x goes through a bunch of casting to create an anonymous function value, creating Function(...)
tree, and assigning the symbol. As far as I know, none of these techniques would translate to Scala 3 unless we can somehow cast into the internal API. Two workarounds that I can think of are:
- Walk the tree twice
- Move the
val
out of lambda
Walking the tree twice would let us create lambda expressions immutably, but that could add some performance penalty. So for now, we can try the second way, which is to move the val
out of lambda, and turn it into var
. You might be repulsed by the use of var
, which is an instinct we develop as Scala programmers. var
is normally discouraged because shared mutable state makes the program difficult to reason about. In this case, we won’t expose the var
to outside, and it will be assigning exactly once before use, so we can think of it as a minor implemtation detail of the macro.
Here’s an example. In sbt 1.x let’s say we are generating something like:
someKey <<= i.map(wrap(name), (q1: String) => { q1 + "!" })
The workaround would be to generate:
someKey <<= {
// step 1: when name.value is found, declare a var and add it to input list
var q1: String = _
// step 3: because there's single input, map is chosen, which requires a lambda
// expression with a single parameter accepting p1: String
i.map(wrap(name), (p1: String) => {
// step 4: assign p1 value to q1 placeholder
q1 = p1
q1 + "!" // step 2: name.value is replaced with ref to q1
})
}
As long as the name q1
is unique within the scope this should be safe. So the implementation splits into the four steps outlined above as comments:
- when
name.value
is found, declare avar
and add it toinput
list name.value
is replaced with ref toq1
- because there’s single input,
map
is chosen, which then requires a lambda expression with a single parameter acceptingp1: String
- assign
p1
value toq1
placeholder
Here’s how we can perform steps 1 and 2:
def subToProxy(tpe: TypeRepr, qual: Term, selection: Term): Term =
val vd = freshValDef(Symbol.spliceOwner, tpe)
inputs = Input(tpe, qual, vd) :: inputs
tpe.asType match
case '[a] =>
Typed(Ref(vd.symbol), TypeTree.of[a])
def substitute(name: String, tpe: TypeRepr, qual: Term, replace: Term) =
convert[A](name, qual) transform { (tree: Term) =>
subToProxy(tpe, tree, replace)
}
val tx = transformWrappers(expr.asTerm, substitute)
Steps 3 and 4:
def genMap(body: Term, input: Input): Expr[i.F[Effect[A]]] =
def genMap0[A1: Type](body: Expr[A1], input: Input): Expr[i.F[A1]] =
input.tpe.asType match
case '[a] =>
val tpe = MethodType(List("$p0"))(_ => List(TypeRepr.of[a]), _ => TypeRepr.of[A1])
val lambda = Lambda(
owner = Symbol.spliceOwner,
tpe = tpe,
rhsFn = (sym, params) => {
val param = params.head.asInstanceOf[Term]
Block(
// $q1 = $p0
List(Assign(Ref(input.local.symbol), param)),
body.asTerm
)
}
).asExprOf[a => A1]
val expr = input.expr.asExprOf[i.F[a]]
Typed(
Block(
// this contains var $q1 = ...
List(input.local),
'{
val _i = $instance
_i
.map[a, A1]($expr.asInstanceOf[_i.F[a]], $lambda)
}.asTerm
),
TypeTree.of[i.F[A1]]
).asExprOf[i.F[A1]]
eitherTree match
case Left(_) =>
genMap0[Effect[A]](body.asExprOf[Effect[A]], input)
case Right(_) =>
flatten(genMap0[i.F[Effect[A]]](body.asExprOf[i.F[Effect[A]]], input))
The unit test looks like this:
package sbt.internal
import sbt.internal.util.appmacro.*
import verify.*
import ContTestMacro.*
object ContTest extends BasicTestSuite:
test("pure") {
assert(contMapNMacro[Int](12) == List(12))
}
test("getMap") {
assert(contMapNMacro[Int](ContTest.wrapInit(List(1)) + 1).toString == "List(2)")
}
// This compiles away
def wrapInit[A](a: List[A]): A = ???
end ContTest
Here I’m using List
datatype as the Functor
to test this.
freshValDef
There’s a detail I glossed over in the above, which is:
val vd = freshValDef(Symbol.spliceOwner, tpe)
Defining a val
or var
can be done like this:
def freshValDef(parent: Symbol, tpe: TypeRepr): ValDef =
tpe.asType match
case '[a] =>
val sym =
Symbol.newVal(parent, freshName("q"), tpe, Flags.Mutable, Symbol.noSymbol)
ValDef(sym, rhs = Option('{ 0 }.asTerm))
private var counter: Int = -1
def freshName(prefix: String): String =
counter = counter + 1
s"$$${prefix}${counter}"
The problem is that I’m hardcoding the right-hand side to be 0
.
Zero
There are probably multiple ways to initialize a var
with a zero-equivalent value, but probably a more straightforward way would be to define a typeclass called Zero[A]
:
package sbt.internal.util.appmacro
trait Zero[A]:
def zero: A
object Zero extends LowPriorityZero:
private[appmacro] def mk[A](a: A): Zero[A] = new Zero:
def zero: A = a
given Zero[Byte] = Zero.mk(0: Byte)
given Zero[Char] = Zero.mk(0: Char)
given Zero[Short] = Zero.mk(0: Short)
given Zero[Int] = Zero.mk(0)
given Zero[Long] = Zero.mk(0L)
given Zero[Float] = Zero.mk(0f)
given Zero[Double] = Zero.mk(0.0)
given Zero[Boolean] = Zero.mk(false)
given Zero[Unit] = Zero.mk((): Unit)
given Zero[String] = Zero.mk("")
class LowPriorityZero:
given [A]: Zero[A] = Zero.mk(null.asInstanceOf[A])
This will have a given instance for all types. There’s a convenient way to defer the summon invocation called summonInline[A]
.
/**
* Constructs a new, synthetic, local var with type `tpe`, a unique name, initialized to
* zero-equivalent (Zero[A]), and owned by `parent`.
*/
def freshValDef(parent: Symbol, tpe: TypeRepr): ValDef =
tpe.asType match
case '[a] =>
val sym =
Symbol.newVal(
parent,
freshName("q"),
tpe,
Flags.Mutable | Flags.Synthetic,
Symbol.noSymbol
)
ValDef(sym, rhs = Option('{ summonInline[Zero[a]].zero }.asTerm))
This would codegen:
var q1: String = summon[Zero[String]].zero
Summary
contImpl
macro implements both Applicative-do and Monadic-do. It does so by first scanning the abstract syntax tree forkey.value
. When there are none, it callspure(...)
andmap(...)
when there is exactly one.- To pass the Instance instance
i
to macro and back to the generated code, it usesSingleton
typei.type
, which internally holds on to the symbol ofi
. - Quotes API provides a convenient way to generating lambda expression, but it must know the exact type of the parameters. This means we can’t gradually expand the definition of the lambda expression as we walk the tree.
To workaround this, we will usevar
one scope outside of the lambda expression. - We defined
Zero
typeclass and summon a value for it to initialize thevar
s.