Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dynamic link & MutiThtread Makes Initialization much slower #22739

Open
oatgnauh opened this issue Oct 15, 2024 · 4 comments
Open

Dynamic link & MutiThtread Makes Initialization much slower #22739

oatgnauh opened this issue Oct 15, 2024 · 4 comments

Comments

@oatgnauh
Copy link

Version of emscripten/emsdk:
emcc (Emscripten gcc/clang-like replacement + linker emulating GNU ld) 3.1.54 (48216dc455f2ac2670ec4f8a32f293b62a730080) clang version 19.0.0git (https://github.com/llvm/llvm-project e769fb8699e3fa8e40623764f7713bfc783b0330) Target: wasm32-unknown-emscripten Thread model: posix InstalledDir: /home/compiler/emsdk/upstream/bin

Full link command and output with -v appended:

/home/emsdk/upstream/emscripten/em++  -s EXPORTED_RUNTIME_METHODS='[ "ccall","stringToUTF8","UTF8ToString"]'
-s EXPORTED_FUNCTIONS='["_malloc", "_free"]'
-s WASM=1
-s WASMFS=1
-s ENVIRONMENT=web,worker
-s USE_PTHREADS=1
-s PTHREAD_POOL_DELAY_LOAD=1
-s FULL_ES2=1
-s FULL_ES3=1
-s FORCE_FILESYSTEM=1
-s WARN_ON_UNDEFINED_SYMBOLS=1
-s WASM_MEM_MAX=1073741824
-s TOTAL_MEMORY=1073741824
-s INITIAL_MEMORY=67108864
-s TOTAL_STACK=1048576
-s ALLOW_MEMORY_GROWTH=1
-s FETCH=1
-s RESERVED_FUNCTION_POINTERS=10
-s MODULARIZE=1
-s EXPORT_ES6=1
-s EXPORT_NAME='createModule'
-s TEXTDECODER=0
-frtti
-fPIC
-msimd128 -mbulk-memory
-fwasm-exceptions -sSUPPORT_LONGJMP=wasm
-lopenal
-lembind
-D__linux__=1
-Wl,--no-check-features
-g0 -Oz
-DNDEBUG
-flto
-s OFFSCREENCANVAS_SUPPORT=1
-s OFFSCREENCANVASES_TO_PTHREAD='#canvas' 
-s ASSERTIONS=0
-sSAFE_HEAP=0
--emit-symbol-map
-sWASM_BIGINT
-sERROR_ON_WASM_CHANGES_AFTER_LINK
-sEXCEPTION_STACK_TRACES
-sMAIN_MODULE=2
-sAUTOLOAD_DYLIBS=1
pathto/libdynamic.wasm

for some reason, I have to split the origin main.wasm to main.wasm + side.wasm. And using -s USE_PTHREADS=1 for mutilthread.

createModule

From the phenomenon point of view, the dynamic link version is slower than the non-dynamic link version by 200ms+ in the createModule process.
dynamiclink:
img_v3_02fm_2003c271-9126-416f-8667-bb8404d6c46g

normal(non-dynamic link):
img_v3_02fm_c2e1ec24-ff19-41ef-a91e-682fb786119g

in my analyse, these mainly because of LDSO.init(); loadDylibs();, for load-time linking,parsing dynamic lib and exportting symbols map

function createWasm() {
 var info = {
  "env": wasmImports,
  "wasi_snapshot_preview1": wasmImports,
  "GOT.mem": new Proxy(wasmImports, GOTHandler),
  "GOT.func": new Proxy(wasmImports, GOTHandler)
 };
 /** @param {WebAssembly.Module=} module*/ function receiveInstance(instance, module) {
  wasmExports = instance.exports;
  wasmExports = relocateExports(wasmExports, 1024);
  var metadata = getDylinkMetadata(module);
  if (metadata.neededDynlibs) {
   dynamicLibraries = metadata.neededDynlibs.concat(dynamicLibraries);
  }
  mergeLibSymbols(wasmExports, "main");
  LDSO.init();
  loadDylibs();
  registerTLSInit(wasmExports["_emscripten_tls_init"], instance.exports, metadata);
  addOnInit(wasmExports["__wasm_call_ctors"]);
  __RELOC_FUNCS__.push(wasmExports["__wasm_apply_data_relocs"]);
  wasmModule = module;
  removeRunDependency("wasm-instantiate");
  return wasmExports;
 }
 addRunDependency("wasm-instantiate");
 function receiveInstantiationResult(result) {
  receiveInstance(result["instance"], result["module"]);
 }
 if (Module["instantiateWasm"]) {
  try {
   return Module["instantiateWasm"](info, receiveInstance);
  } catch (e) {
   err(`Module.instantiateWasm callback failed with error: ${e}`);
   readyPromiseReject(e);
  }
 }
 instantiateAsync(wasmBinary, wasmBinaryFile, info, receiveInstantiationResult).catch(readyPromiseReject);
 return {};
}

start thread

At the same time, dynamic linking is 200ms+ slower than normal when each thread starts. I guess this is partly because each thread needs to synchronize function symbols from the main thread.
image

compare to non-dynamic link:
image

Since these performance degradations are caused by the dynamic link mechanism, it is difficult for users to optimize it. Please give me some feasible suggestions. Thank you.

@oatgnauh
Copy link
Author

And I also notice that __wasm_call_ctors cost more time in dynamic link
non-dynamic link
image
dynamic link
image

@sbc100
Copy link
Collaborator

sbc100 commented Oct 16, 2024

While there could be some savings to be had here, dynamic linking does comes at a cost, just like it does on native platforms.

@oatgnauh
Copy link
Author

While there could be some savings to be had here, dynamic linking does comes at a cost, just like it does on native platforms.

Is there any suggested optimization direction? Thank you

@sbc100
Copy link
Collaborator

sbc100 commented Oct 21, 2024

While there could be some savings to be had here, dynamic linking does comes at a cost, just like it does on native platforms.

Is there any suggested optimization direction? Thank you

One improvement would be if we were to fix #12682. However, its not a simple change. Requires a fair amount refactoring.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants