Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to bundle pre-installed R packages with WebR's WASM for client-side usage without additional installation? #511

Open
zpinocchio opened this issue Jan 24, 2025 · 6 comments
Labels
question Further information is requested

Comments

@zpinocchio
Copy link

Thanks for your work.
I would like to bundle pre-installed R packages, such as ggplot2, along with WebR's WASM so that they can be directly used on the client-side without requiring additional installation of R packages.

How can I package the already installed R packages along with WebR's WASM, ensuring that the client can use the R packages (like ggplot2) without the need for extra installation steps? Specifically, I am interested in embedding both the R environment and the necessary R packages in such a way that they are available immediately upon loading in the client's browser.

@dusadrian
Copy link

I try to do almost exactly the same thing into an Electron application, as described in #510.
The principle is the same, however. At page load, a Javascript code will:

  • mount a VFS - virtual file system containing the additional R packages, to a custom location (see the relevant documentation)
  • add the custom location to .libPaths()
  • load the additional package(s) as usual, R finding them in the new location

I assume you just need to write the relevant JS code and automatically run it on page load, the documentation link above contains an example you can follow.

In my case, although successfully mounting the VFS, for a reason that I don't fully understand, loading the package ends with an error.

@georgestagg georgestagg added the question Further information is requested label Jan 28, 2025
@georgestagg
Copy link
Member

@dusadrian is correct, the method he describes is the way to go, with documentation here.

First, on your machine use the rwasm R package locally to bundle the packages you are interested in as a VFS library. This can be done by running in R:

rwasm::add_pkg("ggplot2")
rwasm::make_vfs_library()

(Note that you will need to follow the Getting Started instructions before these R functions will work.)

The resulting files, written to the vfs subdirectory by default, should be served online as part of your application.

Then, in the client browser at load time, use webR.evalR() function from JS to execute the following R code to download the VFS library files, setup the package library path, and load the package:

webr::mount(mountpoint = "/my-library", source = "https://example.com/library.data")
.libPaths(c(.libPaths(), "/my-library"))
library(ggplot2)

@JeremyPasco
Copy link

This is decreasing the whole bundle size and loading time a lot in my case!
I wonder if it could even be better:

  • rwasm::make_vfs_library() allows compression. In my case, the resulting vfs was 40% lighter. But I don't know how to mount it. Event with the 0.4.2 version fix I got some errors with webr::mount. Is there something to do to mount compressed vfs?
  • a lot of unused stuff come with packages in webr context, like sample data or documentation, which can be pretty heavy with some packages. rwasm::add_pkg puts archives in the repo directory. Is there a way to set some rules to prune packages during the import process ? (e.g. removing all .Rmd files)
  • rwasm::add_pkg() always fetch the last package version. Is it possible to set a specific version for each package (and ideally even for dependencies?)
  • when loading a lot of packages, users tend to quit my app, wondering if it crashed. I fixed it by displaying a progress bar incrementing after each webr.install success. With a VFS image, it is now a unique call. Do you see a way to monitor progress during the mount process?

If you want, I can split the points you find relevant into different issues.

@dusadrian
Copy link

dusadrian commented Feb 8, 2025

That is interesting, I would also like to know.
I imagine it would still be possible to gzip the library.data file, and in Javascript to try/catch reading it. If it errors, it means it has to be unzipped, which will be done by the local browser, then read it again.
In my case (an Electron app), the gain in gzip-ing that file is rather marginal (about 6% of the total app size), but I agree it could be interesting to read the compressed file directly.

For the second point, I guess that is not possible automatically. But you can re-create those packages from their original sources, pruning them yourself.

Linked to that, you can fork those packages on GitHub, and use the add_pkg() function such as: rwasm::add_pkg("https://github.com/username/reponame")

If the packages are added to the VFS library, there should be no real bottleneck loading them. What I do, when doing calculations that require significant time, is to cover the screen with a div showing a message (in your case "Loading packages in the R environment...") and use await to asynchronously remove that cover when this is over. Pretty easy to do, not as fancy as a progress bar but it does alert the user something does happen in the browser.

@JeremyPasco
Copy link

Since rwasm::make_vfs_library() and webr::mount() were developed to work together, I guess the browser should not have to intervene in the process.

For the second point, I hoped for a simpler way, but yeah that's probably how I will end up doing it. I even consider something more brutal to drastically reduce the app size. A lot of packages I need require the whole tidyverse... But I know that only a tiny fraction of each tidyverse package is at work for my use cases. So I thought I could:

1-Create an exhaustive E2E test suit covering all use cases for my apps
2-Store every stack R traces during the test suit execution
3-Process the stack traces to identify the exact list of functions at work for each package
4-Download all the packages required
5-Prune them to keep only functions identified at step 3
6-Make a VFS image of the result

I have no idea what benefit would result with such approach. But I think that if we want to deploy smaller webr apps, we should have something similar to tree shaking for R. The language itself does not make it easy to implement, especially because of the (amazing) tidy eval feature. @georgestagg any thoughts on this? I would gladly explore the topic further, but maybe you already have a better plan for handling this?

PS: showing a message and a spinner was indeed my first approach to keep users waiting. But in case of very slow network, people had no clue if something was happening or if the app was stuck despite the message. Having a real progress bar was necessary (unless I can drastically reduce the app size :D)

@georgestagg
Copy link
Member

rwasm::make_vfs_library() allows compression. In my case, the resulting vfs was 40% lighter. But I don't know how to mount it. Event with the 0.4.2 version fix I got some errors with webr::mount. Is there something to do to mount compressed vfs?

Can you show me an example? webR v0.4.1+ should transparently mount compressed packages produced using rwasm::make_vfs_library(..., compress = TRUE). The files should be named [...].data.gz and [...].js.metadata and the metadata JSON should include the gzip: true property, but that should be handled by {rwasm}.

a lot of unused stuff come with packages in webr context, like sample data or documentation, which can be pretty heavy with some packages. rwasm::add_pkg puts archives in the repo directory. Is there a way to set some rules to prune packages during the import process ? (e.g. removing all .Rmd files)

For certain {rwasm} functions, a strip argument can be provided, e.g.

rwasm::make_vfs_library(..., strip = c('doc', 'vignettte', 'examples', 'html', ...))

to remove a list of named subdirectories of R packages as the package library is created. We do not have tree-shaking as of yet, but this is a crude way to reduce package size.

rwasm::add_pkg() always fetch the last package version. Is it possible to set a specific version for each package (and ideally even for dependencies?)

A list of package references can be given in the remotes argument to rwasm::add_pkg(). For example, for R packages with releases on GitHub, previous tagged releases can be preferred with e.g.

rwasm::add_pkg("cli", remotes=c("r-lib/[email protected]"))

But I think that if we want to deploy smaller webr apps, we should have something similar to tree shaking for R. The language itself does not make it easy to implement, especially because of the (amazing) tidy eval feature. @georgestagg any thoughts on this?

Tree shaking is something I am interested in, but I don't yet have a clear idea of what the path would be to making this work well. I would say it's a goal, but a long-term one. For the moment, we're making do with the naive method shown above of directly stripping out large package directories.

I fixed it by displaying a progress bar incrementing after each webr.install success. With a VFS image, it is now a unique call. Do you see a way to monitor progress during the mount process?

The limiting factor with loading a combined package repository in the form of a single VFS image is the browser download, and there is currently no feedback provided by webR as the data is streamed in, sorry about that. In principle Emscripten can support this, and there are Wasm apps that listen to the events shared by Emscripten as files are downloaded and show a progress bar, so it is possible for a future improvement to webR.

For the moment, the only thing I can think of would be to build to multiple VFS images, mount several package libraries, and show updates after each one is downloaded. It's a hack, but it might be worth the additional complexity if your package library is particularly large.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

4 participants