liberty, equality, and boxed primitive types
I want to understand how equality works in Scala. It’s a complicated topic that’s been going on for ten years.
Major concerns are:
null
- Unboxed number types
- Boxed number types
- Reference types
- Collections (especially of
F[+A]
)
Understanding equality means knowing how these combinations are compared.
Scala Language Specification
The language spec provides some hints, although it does not have the full information. Chapter 12 contains the definition of Any
as follows:
package scala
/** The universal root class */
abstract class Any {
/** Defined equality; abstract here */
def equals(that: Any): Boolean
/** Semantic equality between values */
final def == (that: Any): Boolean =
if (null eq this) null eq that else this equals that
....
}
First thing to note is that both equals
and ==
method are provided by Any
, encompassing both the value types and reference types. This is often called universal equality. In Scala 2.x, this allows comparison of two completely unrelated types such as
scala> 1 == "1"
^
warning: comparing values of types Int and String using `==` will always yield false
res0: Boolean = false
scala> Option(1) == Option("1")
res1: Boolean = false
Given that ==
is final, you might expect that the operator is strictly a symbolic alias of equals
method. However, later in the [numeric value types] section it says:
Comparison methods for equals (
==
), not-equals (!=
), less-than (<
), greater-than (>
), less-than-or-equals (<=
), greater-than-or-equals (>=
), which each exist in 7 overloaded alternatives. Each alternative takes a parameter of some numeric value type. Its result type is typeBoolean
. The operation is evaluated by converting the receiver and its argument to their operation type and performing the given comparison operation of that type.
package scala
abstract sealed class Int extends AnyVal {
def == (that: Double): Boolean // double equality
def == (that: Float): Boolean // float equality
def == (that: Long): Boolean // long equality
def == (that: Int): Boolean // int equality
def == (that: Short): Boolean // int equality
def == (that: Byte): Boolean // int equality
def == (that: Char): Boolean // int equality
....
}
This gives a glimpse at the fact that ==
is not just a symbolic alias of equals
since Scala can overload the operators.
2010 ‘spec for == and ##’
The best reference of Scala 2.x behavior I found was ‘spec for == and ##’ draft Paul Phillips sent to Martin Odersky and scala-internals list on April 13, 2010, then reposted to == and equals in 2010.
Paul wrote:
Here is a kind of off the top of my head attempt to spec out equality and hash codes. I don’t really speak spec-ese but this is written in the pidgin spec-ese within my grasp. Does this look approximately correct? (Anyone else feel free to chime in on that point.) What if anything would you like me to do with it?
Resolution of x == y
- Null values will not cause NPEs.
- Nothing is
==
to null except null.
- All objects must be
==
to themselves.The first three conditions are summarized in this initial expansion of ‘x == y’, which the compiler may or may not inline. All user-defined equals methods are responsible for preserving invariants 2 and 3.
if (x eq y) true else if (x eq null) false else // remainder of algorithm
- If the static type of the left hand side allows for the possibility that it is a boxed or unboxed primitive numeric type (any of
Byte
,Short
,Int
,Long
,Float
,Double
, orChar
) then: go to step 5.If the static type definitively excludes those types, then: the result is
x.equals(y)
.
- If the static types of both operands are primitive types, then: the result is that of the primitive comparison, exactly as performed in java.
If the static types are identical final types (for instance, both are
java.lang.Longs
) then the result isx.equals(y)
.In all other cases, both operands are boxed if necessary and a method in
BoxesRunTime
is called. (The method will be semantically equivalent toBoxesRunTime.equals
, but a different method may be chosen to avoid repeating the above tests.)
BoxesRuntime.equals
All of the preceding logic is preserved, and then it proceeds as follows, where ‘x’ remains the left hand side operand and ‘y’ the right.
- Runtime instance checks will be done to determine the types of the operands, with the following resolutions. (Resolutions represent the semantics, not necessarily the implementation.)
- 1a) If both sides of the comparison are boxed primitives, then they are unboxed and the primitive comparison is performed as in java.
- 1b) If ‘x’ is a class implementing the
scala.math.ScalaNumber
trait, then the result isx.equals(y)
.- 1c) If ‘x’ is a boxed primitive and ‘y’ is a class implementing the
scala.math.ScalaNumber
trait, then the result isy.equals(x)
.- 1d) Otherwise, the result is
x.equals(y)
.….
The rest of the draft includes details about hash codes.
The notable points are:
- Reflexivity is specified.
- Unboxed primitive type equality is delegated to Java’s primitive comparison.
- It tries to emulate unboxed primitive comparison even when boxed primitive types are given.
Java Language Specification
Java Language Specification 15.21.1 defines Numerical Equality Operators == and !=. JLS says that the numeric types are widened to double
if either lhs or rhs is a double
, otherwise float
, long
, or int
. Then it says it will follow IEEE 754 standard, including the fact that any comparison to NaN
returns false
.
Here’s an example of int
getting converted into float
:
In Java, numerical equality applies only to the unboxed primitive types.
cooperative equality
Scala emulates Java’s widening even with boxed primitive types:
scala> java.lang.Integer.valueOf(1) == java.lang.Float.valueOf(1.0f)
val res0: Boolean = true
I am not sure who coined the term, but this behavior is called cooperative equality.
In Java, whenever two values are equal
, hashCode
is required to return the same integer. Since we can’t change hashCode
for the boxed primitives ##
method was created. Here’s from Paul’s draft again:
The unification of primitives and boxed types in scala necessitates measures to preserve the equality contract: equal objects must have equal hash codes. To accomplish this a new method is introduced on
Any
:
def ##: Int
This method should be called in preference to
hashCode
by all scala software which consumes hashCodes.
Here’s a demonstration of hashCode
vs ##
:
scala> 1.hashCode
res1: Int = 1
scala> 1.##
res2: Int = 1
scala> java.lang.Float.valueOf(1.0F).hashCode
res3: Int = 1065353216
scala> java.lang.Float.valueOf(1.0F).##
res4: Int = 1
scala> 1.0F.##
res5: Int = 1
The conversion to boxed primitive types happens transparently in Scala when a numeric type is upcasted to Any
.
scala> (1: Any)
res6: Any = 1
scala> (1: Any).getClass
res7: Class[_] = class java.lang.Integer
scala> (1: Any) == (1.0F: Any)
res8: Boolean = true
This allows Int
and Float
to unify in Scala collections:
scala> Set(1, 1.0f, "foo")
val res9: Set[Any] = Set(1, foo)
However it won’t work for Java collection:
scala> import scala.jdk.CollectionConverters._
import scala.jdk.CollectionConverters._
scala> new java.util.HashSet(List(1, 1.0f, "foo").asJava)
res10: java.util.HashSet[Any] = [1.0, 1, foo]
narrowness of Float
The details of Float
type is described in the Wikipedia entry IEEE 754 single-precision binary floating-point format: binary32. The 32 bits in float
breaks down as follows:
- 1 bit for sign
- 8 bits for exponents ranging from -126 to 127 (all zeros and all ones reserved)
- 23 bits represents 24 bits of signicand ranging from 1 to 16777215
The resulting floating-point number is sign * 2^(exponent) * signicand
.
Note that int
stores 32 bits of integers (or 31 bit for positives) but the float can express 23 bits accurately. This could lead to a rounding error by 1 for any integer above 0xFFFFFF (16777215).
As the int
becomes larger, you can get more ridiculous results:
This will break the ##
contract for Scala as well.
scala> 16777217 == 16777217F
res7: Boolean = true
scala> 16777217.## == 16777217F.##
res8: Boolean = false
In my opinion, we should treat Float type as 24 bit integer, and Double as 53 bit integer when it comes to widening. I’ve reported this as Weak conformance to Float and Double are incorrect #10773. There’s also an open PR by Guillaume Deprecate numeric conversions that lose precision #7405.
NaN
Since this comes up in the discussion of equality, I should note that the comparison with java.lang.Double.NaN
would always return false
. An easy way to cause NaN is dividing 0.0
by 0
. The most surprising thing about NaN comparison is that the NaN itself does not ==
NaN:
scala> 0.0 / 0
res9: Double = NaN
scala> 1.0 == (0.0 / 0)
res10: Boolean = false
scala> (0.0 / 0) == (0.0 / 0)
res11: Boolean = false
In other words, Java or Scala’s ==
is not reflexive when NaN is involved.
Eq typeclass
Around 2010 was also the time when some of the Scala users started to adopt ===
operators introduced by Scalaz library. This bought in the concept of typeclass-based equality used in Haskell.
trait Equal[A] { self =>
def equal(a1: A, a2: A): Boolean
}
This was later copied by libraries like ScalaTest and Cats.
scala> 1 === 1
res4: Boolean = true
scala> 1 === "foo"
<console>:37: error: type mismatch;
found : String("foo")
required: Int
1 === "foo"
^
I personally think this is a significant improvement over the universal equality since it’s fairly common to miss the comparison of wrong types during refactoring. But the invariance also creates a fundamental issue with the way Scala 2.x defines data types through subclass inheritance. For example Some(1)
and None
would need to be upcasted to Option[Int]
.
2011 ‘Rethinking equality’
Martin Ordersky was well aware of ===
. In May 2011 he sent a proposal titled ‘Rethinking equality’:
Now that 2.9 is almost out the door, we have the luxury to think of what could come next. One thing I would like to address is equality. The current version did not age well; the longer one looks at it, the uglier it gets. In particular it is a great impediment for DSL design. Witness the recent popularity of
===
as an alternative equality operator. I previously thought we were stuck with Java-like universal equality for backwards compatibility reasons. But there might be a workable way out of that morass. It works in three steps, which would coincide with the next three major revisions of Scala (yes, sometimes progress has to be slow!)
- Step 1: Introduce invariant
areEqual
@inline def areEqual[T](x: T, y: T)(implicit eq: Equals[T]) = eq.eql(x, y)
.==
would useareEqual
if it typechecks, otherwise it falls back to universal equality. - Step 2:
x == y
usesareEqual
either lhsA1
or rhsA2
hasEquals
instance. - Step 3:
x == y
becomes equivalent toareEqual
.
This 2011 proposal went dormant quickly, but it’s noteworthy since Martin did eventually change equality for Scala 3.x (Dotty).
2016 ‘Multiversal equality for Scala’
In May of 2016 Martin proposed Multiversal equality for Scala with dotty#1246.
Here’s the definition of Eql:
/** A marker trait indicating that values of type `L` can be compared to values of type `R`. */
@implicitNotFound("Values of types ${L} and ${R} cannot be compared with == or !=")
sealed trait Eql[-L, -R]
object Eql {
/** A universal `Eql` instance. */
object derived extends Eql[Any, Any]
/** A fall-back instance to compare values of any types.
* Even though this method is not declared as given, the compiler will
* synthesize implicit arguments as solutions to `Eql[T, U]` queries if
* the rules of multiversal equality require it.
*/
def eqlAny[L, R]: Eql[L, R] = derived
// Instances of `Eql` for common Java types
implicit def eqlNumber : Eql[Number, Number] = derived
implicit def eqlString : Eql[String, String] = derived
// The next three definitions can go into the companion objects of classes
// Seq, Set, and Proxy. For now they are here in order not to have to touch the
// source code of these classes
implicit def eqlSeq[T, U](implicit eq: Eql[T, U]): Eql[GenSeq[T], GenSeq[U]] = derived
implicit def eqlSet[T, U](implicit eq: Eql[T, U]): Eql[Set[T], Set[U]] = derived
// true asymmetry, modeling the (somewhat problematic) nature of equals on Proxies
implicit def eqlProxy : Eql[Proxy, AnyRef] = derived
}
As noted in the comment as well as the Dotty documentation for Multiversal Equality:
Even though
eqlAny
is not declared a given instance, the compiler will still construct aneqlAny
instance as answer to an implicit search for the typeEql[L, R]
, unlessL
orR
haveEql
given instances defined on them, or the language featurestrictEquality
is enabled.
scala> class Box[A](a: A)
// defined class Box
scala> new Box(1) == new Box("1")
val res1: Boolean = false
scala> {
| import scala.language.strictEquality
| new Box(1) == new Box("1")
| }
3 | new Box(1) == new Box("1")
| ^^^^^^^^^^^^^^^^^^^^^^^^^^
| Values of types Box[Int] and Box[String] cannot be compared with == or !=
The documentation for Multiversal Equality also shows how Eql
instances can be derived automatically!
scala> class Box[A](a: A) derives Eql
// defined class Box
reevaluation of cooperative equality
In September 2017, about a year after the multiversal equality proposal, Martin posted ‘Can we get rid of cooperative equality?’ on Contributors forum. In there he says that cooperative equality makes the data structure in Scala slower than the equivalent data structures in Java, and that the equality in Scala is complicated. The responses to the thread made by Jason Zaugg, Sébastien Doeraene and other major contributors to Scala makes this a interesting read into understanding the tradeoffs of cooperative equality.
Jason Zaugg wrote:
My intuition is the ability to use primitives as type arguments sends a signal that
Some[Long](x) == Some[Int](y)
is morally equivalent tox == y
. In Java, you’d have to explicitly use the box type as the type argument.
Sébastien Doeraene wrote:
If
1 == 1L
is true, then I strongly believe that(1: Any) == (1L: Any)
. However, nothing says that1 == 1L
needs to be true! We can instead make it false, or, even better, a compile error.
I thought Oliver Ruebenacker wrote the best summary:
Basically, since Java primitives behave differently from Java boxed numbers, we can’t have comparisons between different numeric types that satisfy all three of these:
(1) Scala unboxed numbers behave like Java primitives (2) Scala boxed numbers behave like Java boxed numbers (3) Scala unboxed numbers behave like Scala boxed numbers
It is difficult to have good JVM performance unless Scala numbers behave like Java numbers. Scala boxed and unboxed being different sounds insane.
The only sane and efficient option seems to be, as has been suggested, to deprecate comparisons between different numeric types and instead require conversion to larger types, like
Long
andDouble
. Since these days almost every platform is 64 bit,Long
andDouble
are natively efficient.
Requiring explicit conversion to Double
when comparing Int
and Double
sounds like a good tradeoff to me too, and it sounds consistent with the direction we are already taking with the multiversal equality.
warning for 1L equal 1
During 2.13.0 RC3, Kenji Yoshida reported #11551 showing that Set
was broken under ++
operation. He also sent a fix #8117, which was a one liner change from:
- if (originalHash == element0UnimprovedHash && element0.equals(element)) {
+ if (originalHash == element0UnimprovedHash && element0 == element) {
On Twitter he also suggested that we should warn about calling equals
or hashCode
on non-AnyRef. I’ve sent a PR #8120 so that the following would cause a warning:
[info] Running (fork) scala.tools.nsc.MainGenericRunner -usejavacp
Welcome to Scala 2.13.0-pre-db58db9 (OpenJDK 64-Bit Server VM, Java 1.8.0_232).
Type in expressions for evaluation. Or try :help.
scala> def check[A](a1: A, a2: A): Boolean = a1.equals(a2)
^
warning: comparing values of types A and A using `equals` is unsafe due to cooperative equality; use `==` instead
check: [A](a1: A, a2: A)Boolean
A few days ago I posted a poll:
Using Scala 2.13.1 or Dotty 0.21.0-RC1 what is the result of the following expression?
((1L == 1) == (1L equals 1)) -> (2147483584F == 2147483647)
- (false, false)
- (false, true)
- (true, false)
- (true, true)
Thanks to all 56 people who participated the poll. The results were
(false, false) 16.1%
(false, true) 30.4%
(true, false) 35.7%
(true, true) 17.9%
My intent of the poll was to survey the awareness of cooperative equality among the Twitter users. Before that I want to mention James Hoffmann’s video about high humidity coffee stroage. To test if high humidity affects the flavor of coffee, he conducts triangle test where samples X, Y, and Y are given blindly, and the first test it trying to pick out X, and then determine which one is better. If I simply posted a poll with true / false, just from the fact alone people would know there’s something suspicious. It’s not perfect, but I though adding one extra bit would make the poll better.
The first part is about cooperative equality, whereas the second part is about narrowness of 24 bit signicand. By summing the total we can say that 46.5% got the cooperative equality part right, and 48.3% got the Float rounding right. A random chance would get 50% so it’s hard to know what percentage of people knows for sure.
summary
Here is some summary.
- Scala 2.x uses universal equality which allows comparison of
Int
andString
. Dotty introduces “multiversal”strictEquality
that works similar toEq
typeclass. - Currently both Scala 2.x and Dotty use Java’s
==
to compare unboxed primitive types. This mixes comparison ofInt
withFloat
andDouble
etc. Float
is narrower thanInt
, andDouble
is narrower thanLong
.- Because Scala transparently boxes
Int
intojava.lang.Integer
as(1: Any)
, it implements cooperative equality for==
and##
, but not forequals
andhashCode
, which emulates widening for boxed primitive types. Many people are unaware of this behavior, and this could lead to surprising bugs if people believed thatequals
is same as==
. - We might be able to remove cooperative equality if we are willing to make unboxed primitive comparison of different types
1L == 1
an error.