Skip to content
This repository has been archived by the owner on Jul 12, 2024. It is now read-only.

Separate IDs of Wasm things from their names. #134

Merged
merged 2 commits into from
May 21, 2024

Conversation

sjrd
Copy link
Collaborator

@sjrd sjrd commented May 18, 2024

Previously, we eagerly generated string names for every Wasm thing, and used that name both as their identity and as their debug name.

Now, we clearly separate those concepts: identities are instances of abstract traits such as FunctionID, while debug names are OriginalNames. Original names are only stored at the definition site, while IDs are used to link definition and use sites.

IDs only need to have meaningful equals and hashCode. This allows to define them as case classes that directly embed ClassNames and MethodNames (among others), without having to decode them or serialize them.

Moreover, since OriginalNames are internally represented as UTF-8 strings, we also avoid the cost of decoding and reencoding them when writing binary .wasm files. The text writer gets more work to do, but it is not on the performance-sensitive path, so this is not really an issue.

For struct field IDs, this change also implies merging field names and field indices. Since references to fields can now use FieldIDs as well, it removes a bit of bookkeeping at the ClassInfo level. Both instance field indices and method table entry indices are late-bound in the binary emitter. We do not have to eagerly compute them.

@sjrd sjrd requested a review from tanishiking May 18, 2024 15:31
@sjrd sjrd force-pushed the separate-ids-from-names branch from fdf764c to 97d73e3 Compare May 18, 2024 15:46
Copy link
Owner

@tanishiking tanishiking left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thank you very much for this huge work 💯

* generated by `LibraryPatches.deriveBoxClass`, instead of explicitly
* calling the constructor. We can do this because we know that this is
/* We use a direct `StructNew` instead of the logical call to `newDefault`
* plus constructor call. We can do this because we know that this is
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Comment on lines +46 to +52
private val fieldIdxValues: Map[TypeID, Map[FieldID, Int]] = {
(for {
recType <- module.types
SubType(typeID, _, _, _, StructType(fields)) <- recType.subTypes
} yield {
typeID -> fields.map(_.id).zipWithIndex.toMap
}).toMap
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great, I like having name resolution for field access only in the binary writer (instead of getFieldIdx everywhere) looks much readable!

Since we are directly setting the field anyway, we can go further
and use `struct.new` without any function call.
@sjrd sjrd force-pushed the separate-ids-from-names branch from 97d73e3 to 164b2cc Compare May 21, 2024 06:40
Previously, we eagerly generated string names for every Wasm thing,
and used that name both as their identity and as their debug name.

Now, we clearly separate those concepts: identities are instances
of abstract traits such as `FunctionID`, while debug names are
`OriginalName`s. Original names are only stored at the definition
site, while IDs are used to link definition and use sites.

IDs only need to have meaningful `equals` and `hashCode`. This
allows to define them as case classes that directly embed
`ClassName`s and `MethodName`s (among others), without having to
decode them or serialize them.

Moreover, since `OriginalName`s are internally represented as UTF-8
strings, we also avoid the cost of decoding and reencoding them
when writing binary `.wasm` files. The text writer gets more work
to do, but it is not on the performance-sensitive path, so this is
not really an issue.

For struct field IDs, this change also implies merging field *names*
and field *indices*. Since references to fields can now use
`FieldID`s as well, it removes a bit of bookkeeping at the
`ClassInfo` level. Both instance field indices and method table
entry indices are late-bound in the binary emitter. We do not have
to eagerly compute them.
@sjrd sjrd force-pushed the separate-ids-from-names branch from 164b2cc to 6344348 Compare May 21, 2024 07:11
@sjrd sjrd merged commit fb3bcb2 into tanishiking:main May 21, 2024
1 check passed
@sjrd sjrd deleted the separate-ids-from-names branch May 21, 2024 07:16
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants