Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Seemingly random crash/freeze #4844

Open
milenovic opened this issue Feb 14, 2025 · 10 comments
Open

Seemingly random crash/freeze #4844

milenovic opened this issue Feb 14, 2025 · 10 comments
Labels
bug Something isn't working

Comments

@milenovic
Copy link

Describe the bug

I am using the AppImage on Rocky Linux 8.8 and sometimes JBrowse2 window would just become totally white. Resizing, waiting, clicking, nothing can bring it back. The window stays up but remains blank. No errors get printed in the terminal window. Upon restarting everying works fine and the session was saved. It occurs kinda randomly, perhaps more likely when multiple tracks are loaded and I try to scroll lefr-right while some of the tracks are being loaded, but it does not always happen.
Is there any way to get more diagnostic info?

Version:
It happened in v2.18.0 and still present in v3.0.1. I have not tried v3.0.3 yet.

@milenovic milenovic added the bug Something isn't working label Feb 14, 2025
@cmdcolin
Copy link
Collaborator

cmdcolin commented Feb 15, 2025

I believe you can press Ctrl+Shift+I to get a "developer console" in case it is a javascript bug, I kind of doubt that it is though

Thre was at least one similar bug that was once reported with a similar thing #3979

You can see particular video recording they made in that thread that has similar 'white screen' but they didn't even get as far as browsing tracks (https://drive.google.com/file/d/1-e19laQ4YNc6grk9mPbE1bFCvEAWIGdW/view?usp=sharing)

There are various threads on electron forums and stuff where people report some similar bugs and it's all quite vague, but sometimes they disable hardware acceleration. I made a build with this line app.disableHardwareAcceleration() to disable

If you want to try https://drive.google.com/file/d/1F5S_FSsbBHFR5poyxzpKNsFvkN7VLcH2/view?usp=sharing (and uses the electron 33.2.0 discussed in other thread)

@milenovic
Copy link
Author

I just tested it, and the issue is infortunately still there in v3.0.3 with app.disableHardwareAcceleration(). If I open the developer console, there are no errors printed, and I get "DevTools was disconnected from the page.". I noticed that electron is supposed to be more verobse when the environment variable ELECTRON_ENABLE_LOGGING=true is set, so I tried it. Now when the window goes white only this is printed on the terminal:
Renderer process crashed - see https://www.electronjs.org/docs/tutorial/application-debugging for potential debugging information.
I suppose it's not very helpful.
Now that I tried to trigger the error on purpose many times, I can say that there is no single reliable way to trigger it. I would load a few tracks, then just move left-right, zoom in-out, load saved sessions, and at some point duiring any of these actions the tracks will be in "Loading..." for a bit and then the rederer crashes.

Feel free to let me know if you know any way to debug it further. I mean, the crash is quite rare so it's not super disruptive, but just trying to help make this awesome tool even better.

Unrelated observations
I don't think any of these are realated to the crash, but perhaps it is useful in some way:

During these tests, I also noticed that when I run the older version, and click to decline the update the termianl says:
[630182:0217/142122.077975:WARNING:browser_main_loop.cc(281)] GLib-GObject: gsignal.c:2642: instance '0x314002ca320' has no handler with id '1295'

Soetimes I noticed these errors but no crash or anything:
[630182:0217/142242.559053:INFO:CONSOLE(197)] "Uncaught NotFoundError: Failed to execute 'removeChild' on 'Node': The node to be removed is not a child of this node.", source: file:///scratch/mmilenov/11648643/.mount_jbrows3BPRTS/resources/app.asar/build/static/js/main.1898d6b8.js (197)
[630182:0217/142242.674977:INFO:CONSOLE(197)] "Uncaught Error: Minified React error #418; visit https://react.dev/errors/418?args[]= for the full message or use the non-minified dev environment for full errors and additional helpful warnings.", source: file:///scratch/mmilenov/11648643/.mount_jbrows3BPRTS/resources/app.asar/build/static/js/main.1898d6b8.js (197)

Simialrly I saw this:
[630380:0217/142322.409141:ERROR:display.cc(191)] Frame latency is negative: -2.282 ms

And when I am zoomed in to let's say 1kb, and then click to a far region of the chromosome, I get many warnings (somethimes hundreads) that say:
[630182:0217/142738.781066:INFO:CONSOLE(14)] "Layout width limit exceeded, discarding old layout. Please be more careful about discarding unused blocks.", source: file:///scratch/mmilenov/11648643/.mount_jbrows3BPRTS/resources/app.asar/build/static/js/1834.9d68e38d.chunk.js (14)

@cmdcolin
Copy link
Collaborator

cmdcolin commented Feb 18, 2025

[630182:0217/142738.781066:INFO:CONSOLE(14)] "Layout width limit exceeded, discarding old layout. Please be more careful about discarding unused blocks.", source: file:///scratch/mmilenov/11648643/.mount_jbrows3BPRTS/resources/app.asar/build/static/js/1834.9d68e38d.chunk.js (14)

would you be able to share the file w/ steps taken that is being used to trigger this error?

i know that "show soft clipping" on alignments tracks trigger this (#3471) but if it is happening in other cases we might want to investigate

it could be a sign of some actually large memory allocations that could crash the browser

I would like to refactor the layout at some point to fix this properly (there should be an easy algorithm to "stack features like bricks right?") but it's been tricky...my previous attempt made everything unacceptably slow

@milenovic
Copy link
Author

Yes of course, that one is very easy to reproduce with minimal data and steps.
I loaded the reference, loaded the BAM file (PacBio HiFi read alignment), zoomed in to 1kb, clicked to navigate far away and the warnings show up. No soft clipping enabled or anything else, every setting was the default one. Please find the data here. I also incuded the saved session file at the point where the step 1 screenshot was taken. This was done on Rocky Linux with v3.0.3 (electron 33.2.0), but I tried on Windows with the official v3.0.3 and I got the same warning.
Step 1:
Image

Step 2:
Image

@cmdcolin
Copy link
Collaborator

thanks for this detailed dataset and instructions. i actually can't reproduce the layout width limit exceeded with it but I can see the potential, and i can't of course say for certain but it could be something to look into for renderer process crashing.

the way the layout works is that is uses a "bitmap", and to lay out a particular feature, it makes a set of bits for a 'genomic region' of the bitmap "occupied" (toggles a set of bits to 1). then to lay out another feature, it checks in a row in the bitmap to see if there is a collision (any 1s in a row) and if there is, it bumps it up in y coordinate until there is no collision.

this works pretty well, but jumping across the chromosome, it can sometimes retain the features from far away, and then this causes large bitmap memory allocations, and then you get "layout width exceeded"

I would like to move away from the bitmap algorithm if possible or otherwise optimize it so it doesn't have any error cases. i made a repo awhile ago for testing different layout algorithms for both speed and accuracy (the test repo tries to visually render the layouts to a png so you can see results) but it's probably just for my own reference

https://github.com/cmdcolin/track_layout_benchmark

@cmdcolin
Copy link
Collaborator

(managed to get some layout discarded warnings now)

@milenovic
Copy link
Author

I added crashReporter to jbrowse-components/products/jbrowse-desktop/electron/electron.ts to save crash dumps to a local directory, and then triggered the original renderer crash (repeated twice) which produced these files.
When parsed using "minidump_stackwalk -c" to get backtraces with symbolication of the crashed thread I get trace-2ea2865e-b17b-450e-ab75-5760be8da3f4.txt and trace-73202b33-1d97-4683-9054-9c1a57e4a346.txt for the two crashes. As I am not familiar with the inner workings of the jbrowse, they don't mean much to me, hopefully they are more meaningful to you.

I also tried to see what happens when I give jbrowse less RAM (3GB instead of my usual 32GB) and then triger the layout discarded warnings and see if it crashes. It actually did, very fast, but not in the same way, the screen wnt white, but it did not say "Renderer process crashed", it instead says that it was killed (due to out of memory). Still, it might be related but I suppose not that directly?

@cmdcolin
Copy link
Collaborator

thanks again for the detailed analysis. will have to dig into that still. just curious, how are you controlling how much ram you give to the jbrowse desktop app?

@milenovic
Copy link
Author

I run linux jbrowse desktop on our HPC cluster to which I connect via VNC. Jbrowse is running in an interactive SLURM job, therefore I decide how many cores and RAM I request.
I will test (maybe today) if I can get the crash without triggering the "layout width exceeded".

@cmdcolin
Copy link
Collaborator

it can allocate large memories at the "sub-layout width exceeded" level so it is not needed to see layout width exceeded for it to crash. the layout width exceeded is it trying not to allocate abunch of memory and it is maybe even for some reason failing.

the layout algorithm was designed in the era of short reads and I think for some reason long reads confuse it also, it goes back to the jbrowse 1 days...i'd say its a pretty good chance the layout is the problem so if I get any progress on the front I can let ya know and see if it helps at all with this issue :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants