suffix trees for finding largest common substring #3

AbyssalRemark · 2024-04-05T18:39:08Z

Suffix trees are a clever way of holding strings that allows us to do really fast string operations such as, longest common sub-string. As well as others. It might make sense to construct our data into suffix trees to allow us to construct the graph more efficiently.

Here is a picture from wikipedia

(lets see if that works)

I read a paper a while back talking about relaxing the definition of suffix trees to allow it to be even faster for the express purpose of mRNA seq (should be true for DNA too)

Will add that paper once I find it.

AbyssalRemark · 2024-04-05T20:03:21Z

This might be it.
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6929406/
This talks about a de novo which is assembling without a scaffold to build from. Which is in part what were doing.

TW-Starbuyer · 2024-04-07T00:23:50Z

Are you proposing we use Suffix Trees for the assembly? It seems the main paper in the repo proposes using de Bruijn Graphs so I'm just curious what the pros/cons would be with each method.

https://www.pnas.org/doi/10.1073/pnas.171285098

AbyssalRemark · 2024-04-07T14:31:17Z

Are you proposing we use Suffix Trees for the assembly? It seems the main paper in the repo proposes using de Bruijn Graphs so I'm just curious what the pros/cons would be with each method.

https://www.pnas.org/doi/10.1073/pnas.171285098

So. Yea thats what this paper is talking about. Thats the relaxed part of things. Its an option sure. But im more interested in them for there general ease of string operations.

If we're checking forwards and backward and inversed (and inversed backwards) then the cost of getting our data into a suffix tree (I hope) makes the string operations less time consuming. Upfront data storage cost to save on the countless comparisons per data points.

But, we should totally look into things more.

AbyssalRemark · 2024-04-08T22:00:59Z

Update. According to Nancy in an email she sent out today:
Our data should all be pointing the same direction according to our extraction methods.
If we have DNA then we still could have the inverse (I think), but if we do have RNA (which is only one strand) then we need only check once in which case we get no benefit from suffix trees. (I still might make em for funzies because there kinda cool)

learner-long-life · 2024-04-19T16:24:05Z

Nice find for suffix trees, I'm learning about them now.
Is longest-common-substring (LCS) finding using them useful for both de novo assembly and read alignment? It seems so to me.

AbyssalRemark added the question Further information is requested label Apr 5, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

suffix trees for finding largest common substring #3

suffix trees for finding largest common substring #3

AbyssalRemark commented Apr 5, 2024

AbyssalRemark commented Apr 5, 2024

TW-Starbuyer commented Apr 7, 2024

AbyssalRemark commented Apr 7, 2024

AbyssalRemark commented Apr 8, 2024

learner-long-life commented Apr 19, 2024

suffix trees for finding largest common substring #3

suffix trees for finding largest common substring #3

Comments

AbyssalRemark commented Apr 5, 2024

AbyssalRemark commented Apr 5, 2024

TW-Starbuyer commented Apr 7, 2024

AbyssalRemark commented Apr 7, 2024

AbyssalRemark commented Apr 8, 2024

learner-long-life commented Apr 19, 2024