automate refactoring with Bazel + Scalafix
about Scalafix
As a code base becomes larger, it’s useful to have language tooling that can perform automatic refactoring. Thankfully, in 2016 Scala Center created Scalafix. In the announcement blog post Ólafur Geirsson wrote:
Scalafix takes care of easy, repetitive and tedious code transformations so you can focus on the changes that truly deserve your attention. In a nutshell, scalafix reads a source file, transforms usage of unsupported features into newer alternatives, and writes the final result back to the original source file.
This shows the original emphasis on Scala 3 migration.
Nowadays Scalafix is maintained by Brice Jaglin et al, and often used as a general linting and refactoring tool, beyond Scala 3 migration. Yoshida-san (xuwei-k) for example, has been writing hundreds of Scalafix rules, and some of them are available publicly in his xuwei-k/scalafix-rules repo.
One interesting characteristic of Scalafix is that there are two kind of rules: syntactic rules and semantic rules.
- A syntactic rule can run directly on source code without compilation. They are simple, and are limited code analysis.
- A semantic rule can do more adanced code analysis based on symbols and types, but it requires input sources to be compiled beforehand with the SemanticDB compiler plugin enabled.
For syntactic rules, all you need is a Scalafix CLI, and you don’t need Bazel integration. Semantic rules require more work since you need to pass the semanticdb etc.
prior works on Bazel integration
- ianoc has a repo called ianoc/bazel-scalafix, but it’s a lot of Bash. In this post I’ll describe an approach using more Starlark, although some Bash will be necessary.
- See also cross build anything with Bazel
Bazel + Scalafix
The overview of the steps
- Resolve
ch.epfl.scala:scalafix-cli_<scalaVersion>:something
- Resolve
org.scalameta:semanticdb-scalac_<scalaVersion>:4.8.4
- Define a
scala_binary
target wrapping Scalafix CLI - Define a custom rule for
scalafix(...)
, which calls Scalafix CLI with appropriate inputs - Define a
scala_library(...)
macro that expands toupstream_scala_library(...)
,semanticdb(...)
, andscalafix(...)
- Define a custom toolchain to set scalacOptions.
- Write a small Bash script
resolving 3rdparty dependencies
There are currently several ways of resolving 3rdparty dependencies, and you can use either of them. I’ve thus far used bazel-deps and MODULE.bazel
, and they both work. The following is a snippet from Bzlmod example:
MODULE.bazel
The exact version of these artifact depends on the availability for the Scala version(s) you’re using.
bazel_dep(name = "mod_scala_multiverse")
local_path_override(
module_name="mod_scala_multiverse",
path="tools/local_modules/default",
)
maven = use_extension("@mod_scala_multiverse//:extensions.bzl", "maven")
maven.install(
artifacts = [
"ch.epfl.scala:::scalafix-cli:0.11.0",
"org.scalameta:::semanticdb-scalac:4.8.4",
....
],
)
use_repo(maven, "maven")
3rdparty/jvm/BUILD.bazel
To absorb the notational differences between the solutions, I am going to define aliases for those dependencies:
load("@scala_multiverse//:cross_scala_config.bzl", "maven_dep")
alias(
name = "ch_epfl_scala__scalafix-cli",
actual = maven_dep("ch.epfl.scala:::scalafix-cli"),
visibility = ["//visibility:public"],
)
alias(
name = "org_scalameta__semanticdb_scalac",
actual = maven_dep("org.scalameta:::semanticdb.scalac"),
visibility = ["//visibility:public"],
)
This way we can reference Scalafix as //3rdparty/jvm:ch_epfl_scala__scalafix-cli
instead of maven_dep("ch.epfl.scala:::scalafix-cli")
.
write a shim for Scalafix
Since we can’t execute a JAR on its own, we can define a scala_binary
shim.
tools/scalafix_app/BUILD.bazel
load("@io_bazel_rules_scala//scala:scala.bzl", "scala_binary")
scala_binary(
name = "bin",
srcs = glob(include = ["*.scala"]),
main_class = "com.example.tools.scalafix_app.Main",
visibility = ["//visibility:public"],
deps = ["//3rdparty/jvm:ch_epfl_scala__scalafix-cli"],
)
tools/scalafix_app/Main.scala
package com.example.tools.scalafix_app
object Main extends App {
scalafix.cli.Cli.main(args)
}
demo 1
We can use bazel run
to try calling Scalafix CLI:
$ bazel run tools/scalafix_app:bin -- -version
INFO: Analyzed target //tools/scalafix_app:bin (0 packages loaded, 0 targets configured).
INFO: Found 1 target...
Target //tools/scalafix_app:bin up-to-date:
bazel-bin/tools/scalafix_app/bin.jar
bazel-bin/tools/scalafix_app/bin
INFO: Elapsed time: 0.155s, Critical Path: 0.00s
INFO: 1 process: 1 internal.
INFO: Build completed successfully, 1 total action
INFO: Running command line: bazel-bin/tools/scalafix_app/bin -version
0.11.0
Note: bazel run
runs inside of a sandbox by design, so this target as it is cannot be used to actually process *.scala
files.
custom rule for scalafix(...)
In Bazel-speak, a rule is a special function whose invocation defines a rule target, which is equivalent to a subproject in build tools like sbt and Gradle.
Instead of defining tasks, in Bazel we introduce new rules to take different actions.
tools/rules/scalafix/BUILD.bazel
# blank file
tools/rules/scalafix/scalafix.bzl
The following defines a rule that generates a Bash script that calls Scalafix CLI with --files
and --classpath
options already passed in.
A notable trick I am using here is to call cd "$BUILD_WORKING_DIRECTORY"
, which allows this script to escape the sandbox and make changes on the workspace in situ.
def _scalafix_impl(ctx):
out = ctx.actions.declare_file(ctx.label.name + ".sh")
tool = ctx.executable._scalafix_bin
tool_rf = ctx.attr._scalafix_bin[DefaultInfo]
srcs = ctx.files.srcs
semanticdb = ctx.attr.semanticdb[DefaultInfo]
deps = semanticdb.files.to_list()
script = """SANDBOX_DIR=$(pwd)
JARS=""
JARS0=({jars})
for JAR in ${{JARS0[@]}}; do
JARS="$JARS --classpath $SANDBOX_DIR/$JAR"
done
cd "$BUILD_WORKING_DIRECTORY"
exec "$SANDBOX_DIR/{tool}" --files {srcs} $JARS --scalac-options -Xlint $@
""".format(
tool = tool.short_path,
srcs = " --files ".join([src.short_path for src in srcs]),
jars = " ".join([x.short_path for x in deps])
)
ctx.actions.write(out, script, is_executable = True)
files = srcs + deps
rf = ctx.runfiles(
transitive_files = depset(files)
).merge(tool_rf.default_runfiles)
return [DefaultInfo(
executable = out,
runfiles = rf,
)]
scalafix = rule(
attrs = {
"srcs": attr.label_list(
allow_files = [".scala"],
),
"semanticdb": attr.label(),
"_scalafix_bin": attr.label(
executable = True,
cfg = "host",
allow_files = True,
default = Label("//tools/scalafix_app:bin"),
),
},
doc = "Runs scalafix",
executable = True,
implementation = _scalafix_impl,
)
This rule expects three arguments, of which one has a default value supplied. In addition, all rules have name
, and optionally visibility
, tags
etc.
custom scala_library
macro
In Bazel-speak, a macro is a pure function that’s typically used to call other rules. They compile away, in a sense that the names of the macros will not be available to bazel query
etc.
Normally we use scala_library
rule provided by rules_scala. We can redefine it in a macro to split scala_library(...)
to define three targets instead of one.
tools/rules/scala/BUILD.bazel
# blank file
tools/rules/scala/scala.bzl
load(
"@io_bazel_rules_scala//scala:scala.bzl",
upstream_scala_library = "scala_library",
)
load(
"@io_bazel_rules_scala//scala/unstable:defs.bzl",
"make_scala_library",
)
load(
"//tools/rules/scalafix:scalafix.bzl",
"scalafix"
)
scala_semanticdb = make_scala_library()
def scala_library(
name,
srcs = [],
deps = [],
runtime_deps = [],
plugins = [],
data = [],
resources = [],
scalacopts = None,
main_class = "",
exports = [],
resource_jars = [],
visibility = None,
javacopts = [],
tags = []):
upstream_scala_library(
name = name,
srcs = srcs,
deps = deps,
runtime_deps = runtime_deps,
plugins = plugins,
resources = resources,
scalacopts = scalacopts,
main_class = main_class,
exports = exports,
resource_jars = resource_jars,
visibility = visibility,
javacopts = [],
tags = tags,
)
scalacopts_mod = scalacopts
if scalacopts_mod and ("-Xfatal-warnings" in scalacopts_mod):
scalacopts_mod.remove("-Xfatal-warnings")
scala_semanticdb(
name = "{}__semanticdb".format(name),
srcs = srcs,
deps = deps,
runtime_deps = runtime_deps,
plugins = plugins + ["//3rdparty/jvm:org_scalameta__semanticdb_scalac"],
resources = resources,
scalacopts = scalacopts_mod,
main_class = main_class,
exports = exports,
resource_jars = resource_jars,
visibility = visibility,
javacopts = [],
tags = tags + ["manual"],
)
scalafix(
name = "{}__scalafix".format(name),
srcs = srcs,
semanticdb = "{}__semanticdb".format(name),
tags = tags + ["manual"],
)
custom toolchain
We should customize the scala_toolchain
so the compilation uses -deprecation
and -Xlint
by default. This is essential because some Scalafix rules rely on these the compiler warnings.
toolchains/BUILD.bazel
load(
"@io_bazel_rules_scala//scala:scala.bzl",
"setup_scala_toolchain",
)
setup_scala_toolchain(
name = "scala_toolchain",
scala_compile_classpath = [
"@maven//:org_scala_lang_scala_compiler",
"@maven//:org_scala_lang_scala_library",
"@maven//:org_scala_lang_scala_reflect",
],
scala_library_classpath = [
"@maven//:org_scala_lang_scala_library",
"@maven//:org_scala_lang_scala_reflect",
],
scala_macro_classpath = [
"@maven//:org_scala_lang_scala_library",
"@maven//:org_scala_lang_scala_reflect",
],
scalacopts = [
"-Yrangepos",
"-deprecation",
"-Xlint",
"-feature",
"-language:existentials",
"-language:higherKinds",
],
visibility = ["//visibility:public"]
)
WORKSPACE
# remove these
-load("@io_bazel_rules_scala//scala:toolchains.bzl", "scala_register_toolchains")
-scala_register_toolchains()
register_toolchains("//toolchains:scala_toolchain")
demo 2
$ bazel query common-test/src/main/scala/gigahorsetest/...
//common-test/src/main/scala/gigahorsetest:gigahorsetest
//common-test/src/main/scala/gigahorsetest:gigahorsetest__scalafix
//common-test/src/main/scala/gigahorsetest:gigahorsetest__semanticdb
Loading: 0 packages loaded
We now see three targets.
convenience script
Here are convenience scripts to call bazel run
.
bin/scalafix
#!/usr/bin/env bash
set -o errexit # abort on nonzero exitstatus
set -o nounset # abort on unbound variable
set -o pipefail # don't hide errors within pipes
BUILD_TARGET_QUERY=${1:-}
if [[ -z "$BUILD_TARGET_QUERY" ]]; then
echo "usage: $0 <query> --rules <scalafix rules>"
exit 1
fi
shift
SCALAFIX_TARGETS=$(bazel query "kind('scalafix', $BUILD_TARGET_QUERY)")
for TARGET in $SCALAFIX_TARGETS; do
echo "processing $TARGET"
bazel run "$TARGET" -- $@
done
demo 3
First, we can use Coursier to install scalafix
on the system path to run syntactic rules. In Scala, procedure syntax:
def close()
is deprecated (I deprecated it from general use in #6325), and the ProcedureSyntax rule can rewrite it to:
def close(): Unit
$ cs install scalafix
$ scalafix core/src/main/scala/gigahorse/FullResponse.scala --rules ProcedureSyntax
$ rg -tscala 'def close' core
core/src/main/scala/gigahorse/FullResponse.scala
27: def close(): Unit
core/src/main/scala/gigahorse/HttpClient.scala
26: def close(): Unit
Since all you need to pass along is *.scala
files, we do not need Bazel integration to run syntactic rules.
demo 4
Next, let’s demonstrate semantic rule by removing unused imports.
package gigahorse
import java.nio.ByteBuffer
import scala.collection.mutable.Stack
abstract class FullResponse {
def bodyAsByteBuffer: ByteBuffer
}
In the above, import scala.collection.mutable.Stack
is unsed.
$ bin/scalafix //core/... --rules RemoveUnused
This removes the ununsed import.
demo 5
Another often used built-in rule for Scalafix is OrganizeImports.
$ bin/scalafix ... --rules OrganizeImports
This expands all imports to one item per line, and alphabetically sorts them:
diff --git a/core/src/main/scala/gigahorse/HttpClient.scala b/core/src/main/scala/gigahorse/HttpClient.scala
index fa08735..891e07f 100644
--- a/core/src/main/scala/gigahorse/HttpClient.scala
+++ b/core/src/main/scala/gigahorse/HttpClient.scala
@@ -16,8 +16,9 @@
package gigahorse
-import scala.concurrent.{ Future, ExecutionContext }
import java.io.File
+import scala.concurrent.ExecutionContext
+import scala.concurrent.Future
Not sure if the result is better, but the good thing is that it can be automated, so it’s one less thing to bikeshed during code reviews.
enforcing Scalafix rules on CI
To enforce Scalafix on CI, list the rules in .scalafix.conf
and call bin/scalafix ... --check
.
.scalafix.conf
rules = [
DisableSyntax,
OrganizeImports,
RemoveUnused,
]
demo 6
$ bin/scalafix ... --check
Here’s an output from an actual GitHub Actions log:
processing //common-test/src/main/scala/gigahorsetest:gigahorsetest__scalafix
Loading:
Loading: 0 packages loaded
Analyzing: target //common-test/src/main/scala/gigahorsetest:gigahorsetest__scalafix (0 packages loaded, 0 targets configured)
INFO: Analyzed target //common-test/src/main/scala/gigahorsetest:gigahorsetest__scalafix (0 packages loaded, 44 targets configured).
INFO: Found 1 target...
....
Target //common-test/src/main/scala/gigahorsetest:gigahorsetest__scalafix up-to-date:
bazel-bin/common-test/src/main/scala/gigahorsetest/gigahorsetest__scalafix.sh
INFO: Elapsed time: 13.906s, Critical Path: 7.07s
INFO: 46 processes: 9 internal, 35 linux-sandbox, 2 worker.
INFO: Build completed successfully, 46 total actions
INFO: Running command line: bazel-bin/common-test/src/main/scala/gigahorsetest/gigahorsetest__scalafix.sh --check
--- /home/runner/work/gigahorse/gigahorse/common-test/src/main/scala/gigahorsetest/BaseHttpClientSpec.scala
+++ <expected fix>
@@ -16,18 +16,20 @@
package gigahorsetest
+import gigahorse.HeaderNames
+import gigahorse.MimeTypes
+import gigahorse.SignatureCalculator
+import gigahorse.WebSocketEvent
import org.scalatest.Assertion
import org.scalatest.flatspec.AsyncFlatSpec
import org.scalatest.matchers.should.Matchers
+import unfiltered.netty.Server
Error: Process completed with exit code 32.
working example
See https://github.com/eed3si9n/gigahorse/pull/86 for a working example.
summary
In the Scala ecosystem, Scalafix, developed originally at Scala Center, provides tooling to automate linting and refactoring. Users with large code base can use built-in or community-maintained rules to rewrite the code efficiently.
With some customization, Scalafix semantic rules can be used on a Bazel monorepo.
license
To the extent possible under law, the author(s) have dedicated all copyright and related and neighboring rights to the code examples to the public domain worldwide. The code examples are distributed without any warranty. See http://creativecommons.org/publicdomain/zero/1.0/.
🏳️🌈 support Ukraine 🇺🇦
Forbidden Colours has started a fundraising campaign to support organisations in Poland, Hungary and Romania that are welcoming LGBTIQ+ refugees.
https://www.forbidden-colours.com/2022/02/26/support-ukrainian-lgbtiq-refugees/
donate to Scala Center
Scala Center is a non-profit center at EPFL to support education and open source.