Skip to content

2024‐06‐07 Meeting Notes

Paul Pham edited this page Jun 7, 2024 · 1 revision

2024-06-07 Meeting Notes

In attendance in CAL East:

Cassidy, Gavin, Dom, Paul

  • Cassidy
    • working on caching transcriptome to optimize the matching process, using bincode crate
    • saving / persisting suffix tree to disk would remove the need to build it every time.
  • Gavin
    • matching up to 40 Mbases of 500 Mbase transcriptome, less than 20%
    • Rust memory usage is decreasing at some point in suffix tree construction / matching, not through our code
      • Could be dropping references?

Some questions and considerations:

  • How to parallelize our construction or use of suffix trees? For GPUs?
    • We could construct separate suffix trees for disjoint parts of the transcriptome, for now, that fit into memory on one computer
    • We could process multiple reads at once for the same transcriptome
  • If we don't know the orientation (5' to 3') or complementarity (which strand) of each read,
    • we would need to try 4 different combinations for each read, quadruple our running time roughly
  • Do we need to see the "N" (error) character in our reads to gain confidence that it is discarding "junk" data
    • from hardware errors, lab technique error, etc

32 GB on Gavin's laptop is our maximum.

If we want to improve the run on a different machine, we'll need more RAM.

Conversations with Nancy Murray at project fair yesterday:

  • She and her student are collecting new data in their lab now
    • "building a library" for genome research means collecting reads under a certain set of circumstances

Administrivia

  • Paul is working on getting a physical room on campus for summer meetings, but we will also have a zoom room.
  • Project will continue through the summer and into the fall. Rain is our SURF researcher.