-
Notifications
You must be signed in to change notification settings - Fork 2
2024‐06‐07 Meeting Notes
Paul Pham edited this page Jun 7, 2024
·
1 revision
In attendance in CAL East:
Cassidy, Gavin, Dom, Paul
- Cassidy
- working on caching transcriptome to optimize the matching process, using
bincode
crate - saving / persisting suffix tree to disk would remove the need to build it every time.
- working on caching transcriptome to optimize the matching process, using
- Gavin
- matching up to 40 Mbases of 500 Mbase transcriptome, less than 20%
- Rust memory usage is decreasing at some point in suffix tree construction / matching, not through our code
- Could be dropping references?
Some questions and considerations:
- How to parallelize our construction or use of suffix trees? For GPUs?
- We could construct separate suffix trees for disjoint parts of the transcriptome, for now, that fit into memory on one computer
- We could process multiple reads at once for the same transcriptome
- If we don't know the orientation (5' to 3') or complementarity (which strand) of each read,
- we would need to try 4 different combinations for each read, quadruple our running time roughly
- Do we need to see the "N" (error) character in our reads to gain confidence that it is discarding "junk" data
- from hardware errors, lab technique error, etc
32 GB on Gavin's laptop is our maximum.
If we want to improve the run on a different machine, we'll need more RAM.
Conversations with Nancy Murray at project fair yesterday:
- She and her student are collecting new data in their lab now
- "building a library" for genome research means collecting reads under a certain set of circumstances
- Paul is working on getting a physical room on campus for summer meetings, but we will also have a zoom room.
- Project will continue through the summer and into the fall. Rain is our SURF researcher.