The compilation cost of implicits
I've accidentally built a compilation benchmark harness for Scala 2 and 3 which can measure the cost (in compilation time) for having a codebase full of implicits which need to be resolved. Read on to see the results!
Background
Typo is a code-generation library for working with PostgreSQL in Scala. You can read more in the introduction.
It was built to replace huge swathes of boilerplate in applications, as well as to provide sorely needed type-safety to avoid having to test everything.
A crucial design goal was to fit into your system such as it is. This means it generates code in the shared subset between Scala 2.12, 2.13 and 3.x, and for three different database access libraries.
Typo output as a realistic compile speed test harness
In typical systems you may have a database layer, a business logic layer and a web layer, with each of them occupying about a third of the code base.
The structure of the database and web layers are often very similar - basically a bunch of case classes with type class instances and mapping code.
So let's say we take the database third of a typical system. We'll use the fact that Typo can generate it in its entirety to build a compile speed benchmark, where we contrast different combinations of scala versions and database libraries.
I'll stress that the code it generates is basically the same code I've written again and again over the years, which should make the benchmark interesting since this is so close to real-world application code.
Let's see where it takes us!
The generated code
If you're curious about the generated code, you can expand this section to see some example code for an email address table in the AdventureWorks database, generated for doobie.
EmailaddressId.scala
/**
* File has been automatically generated by `typo`.
*
* IF YOU CHANGE THIS FILE YOUR CHANGES WILL BE OVERWRITTEN.
*/
package adventureworks
package person
package emailaddress
import adventureworks.person.businessentity.BusinessentityId
/** Type for the composite primary key of table `person.emailaddress` */
case class EmailaddressId(businessentityid: BusinessentityId, emailaddressid: Int)
object EmailaddressId {
implicit lazy val ordering: Ordering[EmailaddressId] = Ordering.by(x => (x.businessentityid, x.emailaddressid))
}
EmailaddressRepo.scala
/**
* File has been automatically generated by `typo`.
*
* IF YOU CHANGE THIS FILE YOUR CHANGES WILL BE OVERWRITTEN.
*/
package adventureworks
package person
package emailaddress
import doobie.free.connection.ConnectionIO
import fs2.Stream
import typo.dsl.DeleteBuilder
import typo.dsl.SelectBuilder
import typo.dsl.UpdateBuilder
trait EmailaddressRepo {
def delete(compositeId: EmailaddressId): ConnectionIO[Boolean]
def delete: DeleteBuilder[EmailaddressFields, EmailaddressRow]
def insert(unsaved: EmailaddressRow): ConnectionIO[EmailaddressRow]
def insert(unsaved: EmailaddressRowUnsaved): ConnectionIO[EmailaddressRow]
def select: SelectBuilder[EmailaddressFields, EmailaddressRow]
def selectAll: Stream[ConnectionIO, EmailaddressRow]
def selectById(compositeId: EmailaddressId): ConnectionIO[Option[EmailaddressRow]]
def update(row: EmailaddressRow): ConnectionIO[Boolean]
def update: UpdateBuilder[EmailaddressFields, EmailaddressRow]
def upsert(unsaved: EmailaddressRow): ConnectionIO[EmailaddressRow]
}
EmailaddressRepoImpl.scala
/**
* File has been automatically generated by `typo`.
*
* IF YOU CHANGE THIS FILE YOUR CHANGES WILL BE OVERWRITTEN.
*/
package adventureworks
package person
package emailaddress
import adventureworks.customtypes.Defaulted
import adventureworks.customtypes.TypoLocalDateTime
import adventureworks.customtypes.TypoUUID
import doobie.free.connection.ConnectionIO
import doobie.syntax.string.toSqlInterpolator
import doobie.util.fragment.Fragment
import fs2.Stream
import typo.dsl.DeleteBuilder
import typo.dsl.SelectBuilder
import typo.dsl.SelectBuilderSql
import typo.dsl.UpdateBuilder
class EmailaddressRepoImpl extends EmailaddressRepo {
override def delete(compositeId: EmailaddressId): ConnectionIO[Boolean] = {
sql"""delete from person.emailaddress where "businessentityid" = ${compositeId.businessentityid} AND "emailaddressid" = ${compositeId.emailaddressid}""".update.run.map(_ > 0)
}
override def delete: DeleteBuilder[EmailaddressFields, EmailaddressRow] = {
DeleteBuilder("person.emailaddress", EmailaddressFields)
}
override def insert(unsaved: EmailaddressRow): ConnectionIO[EmailaddressRow] = {
sql"""insert into person.emailaddress("businessentityid", "emailaddressid", "emailaddress", "rowguid", "modifieddate")
values (${unsaved.businessentityid}::int4, ${unsaved.emailaddressid}::int4, ${unsaved.emailaddress}, ${unsaved.rowguid}::uuid, ${unsaved.modifieddate}::timestamp)
returning "businessentityid", "emailaddressid", "emailaddress", "rowguid", "modifieddate"::text
""".query(EmailaddressRow.read).unique
}
override def insert(unsaved: EmailaddressRowUnsaved): ConnectionIO[EmailaddressRow] = {
val fs = List(
Some((Fragment.const0(s""""businessentityid""""), fr"${unsaved.businessentityid}::int4")),
Some((Fragment.const0(s""""emailaddress""""), fr"${unsaved.emailaddress}")),
unsaved.emailaddressid match {
case Defaulted.UseDefault => None
case Defaulted.Provided(value) => Some((Fragment.const0(s""""emailaddressid""""), fr"${value: Int}::int4"))
},
unsaved.rowguid match {
case Defaulted.UseDefault => None
case Defaulted.Provided(value) => Some((Fragment.const0(s""""rowguid""""), fr"${value: TypoUUID}::uuid"))
},
unsaved.modifieddate match {
case Defaulted.UseDefault => None
case Defaulted.Provided(value) => Some((Fragment.const0(s""""modifieddate""""), fr"${value: TypoLocalDateTime}::timestamp"))
}
).flatten
val q = if (fs.isEmpty) {
sql"""insert into person.emailaddress default values
returning "businessentityid", "emailaddressid", "emailaddress", "rowguid", "modifieddate"::text
"""
} else {
val CommaSeparate = Fragment.FragmentMonoid.intercalate(fr", ")
sql"""insert into person.emailaddress(${CommaSeparate.combineAllOption(fs.map { case (n, _) => n }).get})
values (${CommaSeparate.combineAllOption(fs.map { case (_, f) => f }).get})
returning "businessentityid", "emailaddressid", "emailaddress", "rowguid", "modifieddate"::text
"""
}
q.query(EmailaddressRow.read).unique
}
override def select: SelectBuilder[EmailaddressFields, EmailaddressRow] = {
SelectBuilderSql("person.emailaddress", EmailaddressFields, EmailaddressRow.read)
}
override def selectAll: Stream[ConnectionIO, EmailaddressRow] = {
sql"""select "businessentityid", "emailaddressid", "emailaddress", "rowguid", "modifieddate"::text from person.emailaddress""".query(EmailaddressRow.read).stream
}
override def selectById(compositeId: EmailaddressId): ConnectionIO[Option[EmailaddressRow]] = {
sql"""select "businessentityid", "emailaddressid", "emailaddress", "rowguid", "modifieddate"::text from person.emailaddress where "businessentityid" = ${compositeId.businessentityid} AND "emailaddressid" = ${compositeId.emailaddressid}""".query(EmailaddressRow.read).option
}
override def update(row: EmailaddressRow): ConnectionIO[Boolean] = {
val compositeId = row.compositeId
sql"""update person.emailaddress
set "emailaddress" = ${row.emailaddress},
"rowguid" = ${row.rowguid}::uuid,
"modifieddate" = ${row.modifieddate}::timestamp
where "businessentityid" = ${compositeId.businessentityid} AND "emailaddressid" = ${compositeId.emailaddressid}"""
.update
.run
.map(_ > 0)
}
override def update: UpdateBuilder[EmailaddressFields, EmailaddressRow] = {
UpdateBuilder("person.emailaddress", EmailaddressFields, EmailaddressRow.read)
}
override def upsert(unsaved: EmailaddressRow): ConnectionIO[EmailaddressRow] = {
sql"""insert into person.emailaddress("businessentityid", "emailaddressid", "emailaddress", "rowguid", "modifieddate")
values (
${unsaved.businessentityid}::int4,
${unsaved.emailaddressid}::int4,
${unsaved.emailaddress},
${unsaved.rowguid}::uuid,
${unsaved.modifieddate}::timestamp
)
on conflict ("businessentityid", "emailaddressid")
do update set
"emailaddress" = EXCLUDED."emailaddress",
"rowguid" = EXCLUDED."rowguid",
"modifieddate" = EXCLUDED."modifieddate"
returning "businessentityid", "emailaddressid", "emailaddress", "rowguid", "modifieddate"::text
""".query(EmailaddressRow.read).unique
}
}
EmailaddressRepoMock.scala
/**
* File has been automatically generated by `typo`.
*
* IF YOU CHANGE THIS FILE YOUR CHANGES WILL BE OVERWRITTEN.
*/
package adventureworks
package person
package emailaddress
import doobie.free.connection.ConnectionIO
import doobie.free.connection.delay
import fs2.Stream
import scala.annotation.nowarn
import typo.dsl.DeleteBuilder
import typo.dsl.DeleteBuilder.DeleteBuilderMock
import typo.dsl.DeleteParams
import typo.dsl.SelectBuilder
import typo.dsl.SelectBuilderMock
import typo.dsl.SelectParams
import typo.dsl.UpdateBuilder
import typo.dsl.UpdateBuilder.UpdateBuilderMock
import typo.dsl.UpdateParams
class EmailaddressRepoMock(toRow: Function1[EmailaddressRowUnsaved, EmailaddressRow],
map: scala.collection.mutable.Map[EmailaddressId, EmailaddressRow] = scala.collection.mutable.Map.empty) extends EmailaddressRepo {
override def delete(compositeId: EmailaddressId): ConnectionIO[Boolean] = {
delay(map.remove(compositeId).isDefined)
}
override def delete: DeleteBuilder[EmailaddressFields, EmailaddressRow] = {
DeleteBuilderMock(DeleteParams.empty, EmailaddressFields, map)
}
override def insert(unsaved: EmailaddressRow): ConnectionIO[EmailaddressRow] = {
delay {
val _ = if (map.contains(unsaved.compositeId))
sys.error(s"id ${unsaved.compositeId} already exists")
else
map.put(unsaved.compositeId, unsaved)
unsaved
}
}
override def insert(unsaved: EmailaddressRowUnsaved): ConnectionIO[EmailaddressRow] = {
insert(toRow(unsaved))
}
override def select: SelectBuilder[EmailaddressFields, EmailaddressRow] = {
SelectBuilderMock(EmailaddressFields, delay(map.values.toList), SelectParams.empty)
}
override def selectAll: Stream[ConnectionIO, EmailaddressRow] = {
Stream.emits(map.values.toList)
}
override def selectById(compositeId: EmailaddressId): ConnectionIO[Option[EmailaddressRow]] = {
delay(map.get(compositeId))
}
override def update(row: EmailaddressRow): ConnectionIO[Boolean] = {
delay {
map.get(row.compositeId) match {
case Some(`row`) => false
case Some(_) =>
map.put(row.compositeId, row): @nowarn
true
case None => false
}
}
}
override def update: UpdateBuilder[EmailaddressFields, EmailaddressRow] = {
UpdateBuilderMock(UpdateParams.empty, EmailaddressFields, map)
}
override def upsert(unsaved: EmailaddressRow): ConnectionIO[EmailaddressRow] = {
delay {
map.put(unsaved.compositeId, unsaved): @nowarn
unsaved
}
}
}
EmailaddressRow.scala
/**
* File has been automatically generated by `typo`.
*
* IF YOU CHANGE THIS FILE YOUR CHANGES WILL BE OVERWRITTEN.
*/
package adventureworks
package person
package emailaddress
import adventureworks.customtypes.TypoLocalDateTime
import adventureworks.customtypes.TypoUUID
import adventureworks.person.businessentity.BusinessentityId
import doobie.enumerated.Nullability
import doobie.util.Get
import doobie.util.Read
import java.sql.ResultSet
case class EmailaddressRow(
/** Primary key. Person associated with this email address. Foreign key to Person.BusinessEntityID
Points to [[person.PersonRow.businessentityid]] */
businessentityid: BusinessentityId,
/** Primary key. ID of this email address. */
emailaddressid: Int,
/** E-mail address for the person. */
emailaddress: Option[/* max 50 chars */ String],
rowguid: TypoUUID,
modifieddate: TypoLocalDateTime
){
val compositeId: EmailaddressId = EmailaddressId(businessentityid, emailaddressid)
}
object EmailaddressRow {
implicit lazy val read: Read[EmailaddressRow] =
new Read[EmailaddressRow](
gets = List(
(Get[BusinessentityId], Nullability.NoNulls),
(Get[Int], Nullability.NoNulls),
(Get[/* max 50 chars */ String], Nullability.Nullable),
(Get[TypoUUID], Nullability.NoNulls),
(Get[TypoLocalDateTime], Nullability.NoNulls)
),
unsafeGet = (rs: ResultSet, i: Int) => EmailaddressRow(
businessentityid = Get[BusinessentityId].unsafeGetNonNullable(rs, i + 0),
emailaddressid = Get[Int].unsafeGetNonNullable(rs, i + 1),
emailaddress = Get[/* max 50 chars */ String].unsafeGetNullable(rs, i + 2),
rowguid = Get[TypoUUID].unsafeGetNonNullable(rs, i + 3),
modifieddate = Get[TypoLocalDateTime].unsafeGetNonNullable(rs, i + 4)
)
)
}
EmailaddressRowUnsaved.scala
/**
* File has been automatically generated by `typo`.
*
* IF YOU CHANGE THIS FILE YOUR CHANGES WILL BE OVERWRITTEN.
*/
package adventureworks
package person
package emailaddress
import adventureworks.customtypes.Defaulted
import adventureworks.customtypes.TypoLocalDateTime
import adventureworks.customtypes.TypoUUID
import adventureworks.person.businessentity.BusinessentityId
/** This class corresponds to a row in table `person.emailaddress` which has not been persisted yet */
case class EmailaddressRowUnsaved(
/** Primary key. Person associated with this email address. Foreign key to Person.BusinessEntityID
* Points to [[person.PersonRow.businessentityid]]
*/
businessentityid: BusinessentityId,
/** E-mail address for the person. */
emailaddress: Option[/* max 50 chars */ String],
/** Default: nextval('person.emailaddress_emailaddressid_seq'::regclass)
* Primary key. ID of this email address.
*/
emailaddressid: Defaulted[Int] = Defaulted.UseDefault,
/** Default: uuid_generate_v1() */
rowguid: Defaulted[TypoUUID] = Defaulted.UseDefault,
/** Default: now() */
modifieddate: Defaulted[TypoLocalDateTime] = Defaulted.UseDefault
) {
def toRow(emailaddressidDefault: => Int, rowguidDefault: => TypoUUID, modifieddateDefault: => TypoLocalDateTime): EmailaddressRow =
EmailaddressRow(
businessentityid = businessentityid,
emailaddress = emailaddress,
emailaddressid = emailaddressid match {
case Defaulted.UseDefault => emailaddressidDefault
case Defaulted.Provided(value) => value
},
rowguid = rowguid match {
case Defaulted.UseDefault => rowguidDefault
case Defaulted.Provided(value) => value
},
modifieddate = modifieddate match {
case Defaulted.UseDefault => modifieddateDefault
case Defaulted.Provided(value) => value
}
)
}
In total it's about this much:
--------------------------------------------------------------------------------
Language Files Lines Blank Comment Code
--------------------------------------------------------------------------------
Scala 1052 47011 3215 7436 36360
--------------------------------------------------------------------------------
Initial comparison of compile times
Each benchmark is run three times, and in the graphs you can choose to see minimum or average compile times.
"baseline" is generating just case classes, no type class instances or repositories.
We can make some observations right away:
- Scala 3 is ~always faster than Scala 2.12 and 2.13, only beaten by Scala 2.12 for baseline/just case classes
- doobie takes more than double the time to compile compared to anorm and zio-jdbc for Scala 2.x.
- zio-jdbc and anorm have similar compile times across scala versions.
- It's interesting to see the "cost" of adding type class instances and repositories
Scala 3 is consistently fast! Great job Scala team!
The meat of this blog post will be to investigate why the code for doobie takes so long for scala 2.x.
So what's up with doobie with Scala 2.x?
The issue is composite. Let's take the biggest issue first - Automatic derivation of type class instances!
Let's take something rather innocent:
case class A(v1: String, v2: String, v3: String, v4: String, v5: String, v6: String, v7: String)
sql"select 1,2,3,4,5,6,7".query[A].to[List]
sql"select 1,2,3,4,5,6,7".query[A].to[List]
This will compile and work, but an instance of Read[A]
will be derived for each of the two queries.
No problem, we're taught that we can cache the Read[A]
instance in the companion object.
object A {
implicit val read: Read[A] = Read.derived
}
The surprise is that (as far as I understand) this does not actually work in this case. Since the automatic derivation is put in implicit scope in the companion object of the type class, it will be found before our cached instance. We actually need to specify the instance explicitly:
sql"select 1,2,3,4,5,6,7".query(A.read).to[List]
sql"select 1,2,3,4,5,6,7".query(A.read).to[List]
And boom! We've solved the problem. I implemented this in Typo, and will refer to this as doobie with and without fix in subsequent tables.
Results without accidental automatic type class derivation for doobie
Fantastic! We've cut the compile times almost in half for doobie for Scala 2.x - ten seconds is a lot of time if you suffer them often.
Automatic typeclass derivation is a bad idea, but only for Scala 2.x!
But doobie is still a lot slower, so let's dig a bit further!
Query interpolation woes.
So all of this extra time is spent in typer/resolving implicits. I wanted to see what could be done about it, and what causes it.
That's why I implemented an "inline implicits" mode for typo. I'll show a diff of what it does here, hopefully it'll be clear that it hardcodes some implicit resolution:
+import doobie.syntax.SqlInterpolator.SingleFragment.fromWrite
+import doobie.util.Write
+import doobie.util.meta.Meta
class EmailaddressRepoImpl extends EmailaddressRepo {
override def update(row: EmailaddressRow): ConnectionIO[Boolean] = {
val compositeId = row.compositeId
sql"""update person.emailaddress
- set "emailaddress" = ${row.emailaddress},
- "rowguid" = ${row.rowguid}::uuid,
- "modifieddate" = ${row.modifieddate}::timestamp
- where "businessentityid" = ${compositeId.businessentityid} AND "emailaddressid" = ${compositeId.emailaddressid}"""
+ set "emailaddress" = ${fromWrite(row.emailaddress)(Write.fromPutOption(Meta.StringMeta.put))},
+ "rowguid" = ${fromWrite(row.rowguid)(Write.fromPut(TypoUUID.put))}::uuid,
+ "modifieddate" = ${fromWrite(row.modifieddate)(Write.fromPut(TypoLocalDateTime.put))}::timestamp
+ where "businessentityid" = ${fromWrite(compositeId.businessentityid)(Write.fromPut(BusinessentityId.put))} AND "emailaddressid" = ${fromWrite(compositeId.emailaddressid)(Write.fromPut(Meta.IntMeta.put))}"""
.update
.run
.map(_ > 0)
}
}
Here are the results with and without inlined implicits:
Observations
- This brings the compile time for doobie down to about the same as for anorm and zio-jdbc!
- We lose a handsome amount of compile time for zio-jdbc and anorm. But it's less clear that it's enough to be worth inlining manually. It's a great conclusion actually, resolving implicits is fast as long as it's a straightforward process.
Great! So what is going on with doobie then?
I honestly haven't dug into all the details, but I have a guess which looks obvious. In order to interpolate values into an SQL query, doobie needs to resolve a chain of implicits instead of just one.
In order to interpolate in the emailaddress
field which has type Option[String]
, this thing needs to be resolved:
doobie.syntax.SqlInterpolator.SingleFragment.fromWrite(unsaved.emailaddress)(
doobie.util.Write.fromPutOption(
doobie.util.Put.metaProjectionWrite(
doobie.util.meta.Meta.StringMeta
)
)
)
I'm sure this can cause the compiler to spend a lot of time looking around the companion objects of
SingleFragment
, Write
, Put
, and Meta
.
Make the compiler's job easy! It probably shouldn't need to go through so many layers at all.
Scala 3 must have some caching of resolved implicits that Scala 2.x doesn't have.
Future work on benchmark
Here are other things things I've done to speed up compilation of generated code, based on observations as I was developing. A future version of the benchmark could measure the effect of these changes as well. Reach out if you're interesting in contributing towards this.
Avoiding anonymous classes
Let's have a look at the code for doobie.postgres.Text[A]
, it's basically this:
trait Text[A] { outer =>
def unsafeEncode(a: A, sb: StringBuilder): Unit
}
object Text {
def instance[A](f: (A, StringBuilder) => Unit): Text[A] =
new Text[A] {
def unsafeEncode(a: A, sb: StringBuilder) = f(a, sb)
}
}
Using Text.instance
instead of new Text[A]
saves some compilation time because it generates less bytecode.
If you're a library author you should consider adding such a constructor method.
Avoiding automatic derivation of product type class instances
Typo reimplements the derivation of type class instances for product types, so the benchmark cannot measure the cost of this so far.
If you expand EmailaddressRow.scala
above, you'll see that Read[EmailaddressRow]
is always implemented as new Read(...)
.
It would be very interesting to measure the cost of deriving this automatically as well, we just need to patch Typo
to use Read.derived
instead of new Read(...)
.
Avoiding type aliases
Doobie uses a pattern where doobie.util.Read
is exposed as doobie.Read
through a baroque mechanism:
package object doobie
extends Aliases
trait Aliases extends Types with Modules
trait Types {
/** @group Type Aliases - Core */ type Read[A] = doobie.util.Read[A]
}
Typo always generates the fully qualified name doobie.util.Read
.
It would be interesting to measure if there is a cost associated with this as well.
Final results and limitations
JSON libraries
This last graph includes compilation time for three JSON libraries as well, basically just generating type class instances for them. I excluded them for the text above since there was nothing interesting to say about the results. You can see how "inline implicits" mode also speeds up compilation of these JSON codecs.
Benchmark limitations
I think part of this improvement from "inlined implicits" is due to the fact that the compiler is a bit warmer since it will just have finished compiling without inlined implicits.
Note specifically that we get faster compiles for "baseline" with "inlined implicits" mode, although the generated code is the same.
I didn't bother improving the benchmark more, because the interesting things mentioned above was very visible and consistent already.
Final graph
Reproducing the results
You can clone the repo and run bleep compile-benchmarks
to reproduce the results.
The benchmark code can be found in CompileBenchmark.scala