Separate IDs of Wasm things from their names. #134

sjrd · 2024-05-18T15:31:12Z

Previously, we eagerly generated string names for every Wasm thing, and used that name both as their identity and as their debug name.

Now, we clearly separate those concepts: identities are instances of abstract traits such as FunctionID, while debug names are OriginalNames. Original names are only stored at the definition site, while IDs are used to link definition and use sites.

IDs only need to have meaningful equals and hashCode. This allows to define them as case classes that directly embed ClassNames and MethodNames (among others), without having to decode them or serialize them.

Moreover, since OriginalNames are internally represented as UTF-8 strings, we also avoid the cost of decoding and reencoding them when writing binary .wasm files. The text writer gets more work to do, but it is not on the performance-sensitive path, so this is not really an issue.

For struct field IDs, this change also implies merging field names and field indices. Since references to fields can now use FieldIDs as well, it removes a bit of bookkeeping at the ClassInfo level. Both instance field indices and method table entry indices are late-bound in the binary emitter. We do not have to eagerly compute them.

tanishiking

LGTM, thank you very much for this huge work 💯

tanishiking · 2024-05-21T04:35:05Z

wasm/src/main/scala/org/scalajs/linker/backend/wasmemitter/FunctionEmitter.scala

-     * generated by `LibraryPatches.deriveBoxClass`, instead of explicitly
-     * calling the constructor. We can do this because we know that this is
+    /* We use a direct `StructNew` instead of the logical call to `newDefault`
+     * plus constructor call. We can do this because we know that this is


tanishiking · 2024-05-21T04:50:16Z

wasm/src/main/scala/org/scalajs/linker/backend/webassembly/BinaryWriter.scala

+  private val fieldIdxValues: Map[TypeID, Map[FieldID, Int]] = {
+    (for {
+      recType <- module.types
+      SubType(typeID, _, _, _, StructType(fields)) <- recType.subTypes
+    } yield {
+      typeID -> fields.map(_.id).zipWithIndex.toMap
+    }).toMap


Great, I like having name resolution for field access only in the binary writer (instead of getFieldIdx everywhere) looks much readable!

Since we are directly setting the field anyway, we can go further and use `struct.new` without any function call.

Previously, we eagerly generated string names for every Wasm thing, and used that name both as their identity and as their debug name. Now, we clearly separate those concepts: identities are instances of abstract traits such as `FunctionID`, while debug names are `OriginalName`s. Original names are only stored at the definition site, while IDs are used to link definition and use sites. IDs only need to have meaningful `equals` and `hashCode`. This allows to define them as case classes that directly embed `ClassName`s and `MethodName`s (among others), without having to decode them or serialize them. Moreover, since `OriginalName`s are internally represented as UTF-8 strings, we also avoid the cost of decoding and reencoding them when writing binary `.wasm` files. The text writer gets more work to do, but it is not on the performance-sensitive path, so this is not really an issue. For struct field IDs, this change also implies merging field *names* and field *indices*. Since references to fields can now use `FieldID`s as well, it removes a bit of bookkeeping at the `ClassInfo` level. Both instance field indices and method table entry indices are late-bound in the binary emitter. We do not have to eagerly compute them.

sjrd requested a review from tanishiking May 18, 2024 15:31

sjrd force-pushed the separate-ids-from-names branch from fdf764c to 97d73e3 Compare May 18, 2024 15:46

tanishiking approved these changes May 21, 2024

View reviewed changes

Simplify the code generated by genBox.

ca21827

Since we are directly setting the field anyway, we can go further and use `struct.new` without any function call.

sjrd force-pushed the separate-ids-from-names branch from 97d73e3 to 164b2cc Compare May 21, 2024 06:40

sjrd force-pushed the separate-ids-from-names branch from 164b2cc to 6344348 Compare May 21, 2024 07:11

sjrd merged commit fb3bcb2 into tanishiking:main May 21, 2024
1 check passed

sjrd deleted the separate-ids-from-names branch May 21, 2024 07:16

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Separate IDs of Wasm things from their names. #134

Separate IDs of Wasm things from their names. #134

sjrd commented May 18, 2024

tanishiking left a comment

tanishiking May 21, 2024

tanishiking May 21, 2024

Separate IDs of Wasm things from their names. #134

Separate IDs of Wasm things from their names. #134

Conversation

sjrd commented May 18, 2024

tanishiking left a comment

Choose a reason for hiding this comment

tanishiking May 21, 2024

Choose a reason for hiding this comment

tanishiking May 21, 2024

Choose a reason for hiding this comment