-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Virulence genes and plasmids #28
Comments
Hi Pablo, Yes absolutely! I was am planning to add the option of making TORMES work with user-customized databases for the next version, but at this time I can show you how to incorporate your sequences to the already existing virulence and plasmid databases. Is it that OK for you? If so, please just send me the output of Best, |
Hi Narciso, This is what I get after typing
What next? Cheers, |
Hi Pablo, Thanks for sharing. This way I can write to you the exact paths. What we are going to do is to add your sequences to the already available VFDB and PlasmidFinder databases.
For the
Once you have the headers of all genes from each fasta file already modified, type:
The next step is to format the new databases. Do:
And that's it. The next time you are running TORMES, the new genes will be included in the "Virulence genes" and the "Plasmid replicons" sections of the tormes-report. |
Hi Narciso, I will definitely try this! Thank you for your clear explanation. However, I still have two questions: (1) The final nomenclature for the virulence genes should be
or
(2) The plasmid sequences should be provided in one multi-fasta file? Cheers, |
Hi Pablo, (1) Both headers will work (remember the ">" at the beginning of each header). The last (2) This depends on how many sequences do you have. The software is going to read only one file (multi-fasta). Therefore, if you have all sequences in the same fasta file you can follow the steps I told you before. On the other hand, if you have one sequence per file you can concatenate those fasta files into a single one first. For instance:
And then just follow the other steps. I hope this helps, |
Hi Narciso, Thanks a lot for the great explanation!!! I have tried following and then running the tormes pipeline but then I got an error :( `Quitting from lines 243-252 (tormes_report.Rmd) Execution halted Any idea what is going on? Cheers, |
Hi Pablo, The report generation failed when performing the action between lines 243 and 252 of the tormes_report.Rmd. Did the other analyses run well? Best, |
Hi Pablo, Thanks for sharing the files. Additionally, I've see that you don't have any output from Kraken2. Please perform this step before running Let me know if it worked! |
Hi Narciso, It worked perfectly!!!! Thanks a lot for your support!!! Cheers, |
Hi Pablo, Good that it worked! Best, |
Hi Narciso, I was trying to add a new set of plasmid sequences to the DB and I got this error when I tried to run this command:
Is it ok? Or is there something wrong? Cheers, |
Hi Pablo,
It looks like it is saying that in your sequences you have a bad character
that blast doesn't like. It is saying 'Bad char [0xCA] in string at byte 56'
What you can do, open your sequences file in notepad++ and make sure that
end of line is set to Unix format 'utf-8' I think. This is under an option
called EOL conversion I am pretty sure.
Save your file, then try again.
Hopefully it should fix your problem.
I don't know as much as Narciso so I may be wrong, but i suspect that maybe
you copied your files from a windows notepad file? Unix and windows do not
handle the invisible code that signifies the end of a line in a file the
same way.
Cheers
Brad
…On Wed, 14 Oct 2020, 6:35 am pavlo888, ***@***.***> wrote:
Hi Narciso,
I was trying to add a new set of plasmid sequences to the DB and I got
this error when I tried to run this command:
makeblastdb -in
/home/sam/anaconda3/envs/tormes-1.2.0/db/plasmidfinder/sequences -title
plasmidfinder -dbtype nucl -hash_index
Building a new DB, current time: 10/13/2020 22:03:26
New DB name: /home/sam/anaconda3/envs/tormes-1.2.0/db/plasmidfinder/sequences
New DB title: plasmidfinder
Sequence type: Nucleotide
Deleted existing Nucleotide BLAST database named /home/sam/anaconda3/envs/tormes-1.2.0/db/plasmidfinder/sequences
Keep MBits: T
Maximum file size: 1000000000B
Error: (803.7) [makeblastdb] Blast-def-line-set.E.title
Bad char [0xCA] in string at byte 56
Plasmidfinder~~~pRi2659~~~EU186381 Agrobacterium�rhizogenes plasmid pRi2659, completegenome [Agrobacterium rhizogenes]
Error: (803.7) [makeblastdb] Blast-def-line-set.E.title
Bad char [0xCA] in string at byte 67
Plasmidfinder~~~pRi1724~~~AP002086 Agrobacterium rhizogenes�plasmid pRi1724 DNA, complete sequence [Agrobacterium rhizogenes]
Error: (803.7) [makeblastdb] Blast-def-line-set.E.title
Bad char [0xCA] in string at byte 56
Plasmidfinder~~~pRi2659~~~EU186381 Agrobacterium�rhizogenes plasmid pRi2659, completegenome [Agrobacterium rhizogenes]
Error: (803.7) [makeblastdb] Blast-def-line-set.E.title
Bad char [0xCA] in string at byte 67
Plasmidfinder~~~pRi1724~~~AP002086 Agrobacterium rhizogenes�plasmid pRi1724 DNA, complete sequence [Agrobacterium rhizogenes]
Adding sequences from FASTA; added 464 sequences in 0.019649 seconds.
Is it ok? Or is there something wrong?
Cheers,
Pablo
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#28 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ANITBGTVNEJFVNB45R5T2YDSKSXIVANCNFSM4RC3T5KQ>
.
|
Hi @pavlo888 As @biobrad said, it seems that you have some problematic characters that are making
Additionally, mind the capital characters when writing the database name. In your Please run this command for solving this issue: Best, |
Hi @nmquijada and @biobrad, Thanks a lot for your input! Indeed there was a weird character in my sequences. But I deleted it manually and it is now all running smoothly! thank you for your help. Another quick question. Is there a way to visualize the bootstrap values in the phylogenetic trees obtained as a result? Or is there any other way to check statistically for the topology of the tree? Cheers, |
Hi Pablo, Yes, there are! However, it requires some manipulation of the tormes_report.Rmd file (or doing it by yourself in R environment) contained in report_files.tgz. Let's modify the visualization of your pangenome tree based on the presence/absence of accessory genes. If you open the tormes_report.Rmd file and navigate to the "Pangenome analysis" section, you might find something like this:
The last line contains the options to be parsed to the ggtree command to visualize your tree. So, for instance, if you would like to include the tree scale (evolution distance), you can modify that line by:
If you would like to add the branch support values, you can do:
After you modified your tormes_report.Rmd file, you can easily re-run you report with the new changes by using the "render_report.sh" script as explained here: https://github.com/nmquijada/tormes#rendering-customized-reports There's lot of further customizations possible. I encourage you to go through the ggtree guide: http://yulab-smu.top/treedata-book/index.html Let me know if you have any question! I think this is a useful discussion and maybe other users would be interested in doing so too. Thus I would try to upload a small tutorial to the wiki pages. Best, |
Hi Pablo, We have just released the new TORMES version 1.3.0. I will close the issue now. |
Hi, |
Hi Narciso,
The new release looks really good!!! I can't wait to try it out. However I was wondering if it would be possible to supply a (fasta) list of virulence genes of interest and a plasmid sequences of interest for the pipeline to focus on?
Cheers,
Pablo
The text was updated successfully, but these errors were encountered: