-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
suffix trees for finding largest common substring #3
Comments
This might be it. |
Are you proposing we use Suffix Trees for the assembly? It seems the main paper in the repo proposes using de Bruijn Graphs so I'm just curious what the pros/cons would be with each method. |
So. Yea thats what this paper is talking about. Thats the relaxed part of things. Its an option sure. But im more interested in them for there general ease of string operations. If we're checking forwards and backward and inversed (and inversed backwards) then the cost of getting our data into a suffix tree (I hope) makes the string operations less time consuming. Upfront data storage cost to save on the countless comparisons per data points. But, we should totally look into things more. |
Update. According to Nancy in an email she sent out today: |
Nice find for suffix trees, I'm learning about them now. |
Suffix trees are a clever way of holding strings that allows us to do really fast string operations such as, longest common sub-string. As well as others. It might make sense to construct our data into suffix trees to allow us to construct the graph more efficiently.
Here is a picture from wikipedia
(lets see if that works)
I read a paper a while back talking about relaxing the definition of suffix trees to allow it to be even faster for the express purpose of mRNA seq (should be true for DNA too)
Will add that paper once I find it.
The text was updated successfully, but these errors were encountered: