Virulence genes and plasmids #28

pavlo888 · 2020-09-09T17:51:45Z

Hi Narciso,

The new release looks really good!!! I can't wait to try it out. However I was wondering if it would be possible to supply a (fasta) list of virulence genes of interest and a plasmid sequences of interest for the pipeline to focus on?

Cheers,
Pablo

nmquijada · 2020-09-10T08:49:01Z

Hi Pablo,

Yes absolutely! I was am planning to add the option of making TORMES work with user-customized databases for the next version, but at this time I can show you how to incorporate your sequences to the already existing virulence and plasmid databases. Is it that OK for you?

If so, please just send me the output of which tormes (after activating the environment)

Best,
Narciso

pavlo888 · 2020-09-10T20:54:45Z

Hi Narciso,

This is what I get after typing

which tormes

/home/sam/anaconda3/envs/tormes-1.2.0/bin/tormes

What next?

Cheers,
Pablo

nmquijada · 2020-09-11T16:11:57Z

Hi Pablo,

Thanks for sharing. This way I can write to you the exact paths.
For the genes screening, TORMES uses Abricate, that has instructions in how to create a custom database in its github page: https://github.com/tseemann/abricate#making-your-own-database
As we are going to make it compatible with TORMES, I will guide you through this process:

What we are going to do is to add your sequences to the already available VFDB and PlasmidFinder databases.
So, imagine you have your new genes in two different files, new-virulence.fasta and new-plasmids.fasta, for virulence and plasmid genes, respectively. The first thing that has to be done is to modify the header of each gene to match the following pattern co it can be recognized by Abricate:

>DBname~~~gene~~~gene_accession~~ description of the gene

For the new-virulence.fasta, modify "DBname" by "vfdb" and for the new-plasmids.fasta, modify "DBname" by "plasmidfinder".
In case you don't have a gene accession number, please write "NA" instead of "gene_accession".
The description of the gene can be an entire sentence. Write "NA" if you don't have a description. As you are including new genes in the original VFDB and PlasmidFinder, maybe some description might make it easier to identify. Please mind the space between the last "~" and the beginning of the description.
As an example, this is how a gene header must look like:

>vfdb~~~aafB~~~AAD27809 (aafB) invasin homolog AafB [AAFs (VF0214)] [Escherichia coli str. 042]

Once you have the headers of all genes from each fasta file already modified, type:

cat new-virulence.fasta >> /home/sam/anaconda3/envs/tormes-1.2.0/db/vfdb/sequences
cat new-plasmids.fasta >> /home/sam/anaconda3/envs/tormes-1.2.0/db/plasmidfinder/sequences

The next step is to format the new databases. Do:

makeblastdb -in /home/sam/anaconda3/envs/tormes-1.2.0/db/vfdb/sequences -title vfdb -dbtype nucl -hash_index
makeblastdb -in /home/sam/anaconda3/envs/tormes-1.2.0/db/plasmidfinder/sequences -title plasmidfinder -dbtype nucl -hash_index

And that's it. The next time you are running TORMES, the new genes will be included in the "Virulence genes" and the "Plasmid replicons" sections of the tormes-report.
Let me know if you have any issue.
Narciso

pavlo888 · 2020-09-15T14:27:51Z

Hi Narciso,

I will definitely try this! Thank you for your clear explanation. However, I still have two questions:

(1) The final nomenclature for the virulence genes should be

vfdb~~~aafB~~~AAD27809 (aafB) invasin homolog AafB [AAFs (VF0214)] [Escherichia coli str. 042]

or

vfdb~~~aafB~~~AAD27809~~ (aafB) invasin homolog AafB [AAFs (VF0214)] [Escherichia coli str. 042]?

(2) The plasmid sequences should be provided in one multi-fasta file?

Cheers,
Pablo

nmquijada · 2020-09-16T08:52:55Z

Hi Pablo,

(1) Both headers will work (remember the ">" at the beginning of each header). The last ~~ look for the "resistance" in antimicrobial resistance datasets, but that field is not included in virulence or plasmid genes in tormes-report.

(2) This depends on how many sequences do you have. The software is going to read only one file (multi-fasta). Therefore, if you have all sequences in the same fasta file you can follow the steps I told you before. On the other hand, if you have one sequence per file you can concatenate those fasta files into a single one first. For instance:

cat file1.fasta file2.fasta file3.fasta >> new-plasmids.fasta

And then just follow the other steps.

I hope this helps,
Narciso

pavlo888 · 2020-09-22T21:58:15Z

Hi Narciso,

Thanks a lot for the great explanation!!!

I have tried following and then running the tormes pipeline but then I got an error :(

`Quitting from lines 243-252 (tormes_report.Rmd)
Error in names(x) <- value :
'names' attribute [3] must be the same length as the vector [2]
Calls: ... withCallingHandlers -> withVisible -> eval -> eval -> colnames<-

Execution halted
`

Any idea what is going on?

Cheers,
Pablo

nmquijada · 2020-09-23T06:37:36Z

Hi Pablo,

The report generation failed when performing the action between lines 243 and 252 of the tormes_report.Rmd. Did the other analyses run well?
I might need the "report_files.tgz" to inspect the error. Can you please share that file with me?
If you think you have sensitive information, send it to my email: nmartinquijada at gmail dot com

Best,
Narciso

nmquijada · 2020-09-24T07:16:19Z

Hi Pablo,

Thanks for sharing the files.
The report failed beacuse your bacteria don't have an MLST scheme and so that analysis "failed". You can solve it by running again tormes by including the --no_mlst option, as explained here: #27 (comment).

Additionally, I've see that you don't have any output from Kraken2. Please perform this step before running tormes again:
#30 (comment)

Let me know if it worked!
Narciso

pavlo888 · 2020-09-26T20:05:02Z

Hi Narciso,

It worked perfectly!!!! Thanks a lot for your support!!!

Cheers,
Pablo

nmquijada · 2020-09-28T06:38:48Z

Hi Pablo,

Good that it worked!

Best,
Narciso

pavlo888 · 2020-10-13T20:05:14Z

Hi Narciso,

I was trying to add a new set of plasmid sequences to the DB and I got this error when I tried to run this command:

makeblastdb -in /home/sam/anaconda3/envs/tormes-1.2.0/db/plasmidfinder/sequences -title plasmidfinder -dbtype nucl -hash_index

Building a new DB, current time: 10/13/2020 22:03:26
New DB name:   /home/sam/anaconda3/envs/tormes-1.2.0/db/plasmidfinder/sequences
New DB title:  plasmidfinder
Sequence type: Nucleotide
Deleted existing Nucleotide BLAST database named /home/sam/anaconda3/envs/tormes-1.2.0/db/plasmidfinder/sequences
Keep MBits: T
Maximum file size: 1000000000B
Error: (803.7) [makeblastdb] Blast-def-line-set.E.title
Bad char [0xCA] in string at byte 56
Plasmidfinder~~~pRi2659~~~EU186381 Agrobacterium�rhizogenes plasmid pRi2659, completegenome [Agrobacterium rhizogenes]
Error: (803.7) [makeblastdb] Blast-def-line-set.E.title
Bad char [0xCA] in string at byte 67
Plasmidfinder~~~pRi1724~~~AP002086 Agrobacterium rhizogenes�plasmid pRi1724 DNA, complete sequence  [Agrobacterium rhizogenes]
Error: (803.7) [makeblastdb] Blast-def-line-set.E.title
Bad char [0xCA] in string at byte 56
Plasmidfinder~~~pRi2659~~~EU186381 Agrobacterium�rhizogenes plasmid pRi2659, completegenome [Agrobacterium rhizogenes]
Error: (803.7) [makeblastdb] Blast-def-line-set.E.title
Bad char [0xCA] in string at byte 67
Plasmidfinder~~~pRi1724~~~AP002086 Agrobacterium rhizogenes�plasmid pRi1724 DNA, complete sequence  [Agrobacterium rhizogenes]
Adding sequences from FASTA; added 464 sequences in 0.019649 seconds.

Is it ok? Or is there something wrong?

Cheers,
Pablo

biobrad · 2020-10-13T22:04:31Z

Hi Pablo, It looks like it is saying that in your sequences you have a bad character that blast doesn't like. It is saying 'Bad char [0xCA] in string at byte 56' What you can do, open your sequences file in notepad++ and make sure that end of line is set to Unix format 'utf-8' I think. This is under an option called EOL conversion I am pretty sure. Save your file, then try again. Hopefully it should fix your problem. I don't know as much as Narciso so I may be wrong, but i suspect that maybe you copied your files from a windows notepad file? Unix and windows do not handle the invisible code that signifies the end of a line in a file the same way. Cheers Brad

…

On Wed, 14 Oct 2020, 6:35 am pavlo888, ***@***.***> wrote: Hi Narciso, I was trying to add a new set of plasmid sequences to the DB and I got this error when I tried to run this command: makeblastdb -in /home/sam/anaconda3/envs/tormes-1.2.0/db/plasmidfinder/sequences -title plasmidfinder -dbtype nucl -hash_index Building a new DB, current time: 10/13/2020 22:03:26 New DB name: /home/sam/anaconda3/envs/tormes-1.2.0/db/plasmidfinder/sequences New DB title: plasmidfinder Sequence type: Nucleotide Deleted existing Nucleotide BLAST database named /home/sam/anaconda3/envs/tormes-1.2.0/db/plasmidfinder/sequences Keep MBits: T Maximum file size: 1000000000B Error: (803.7) [makeblastdb] Blast-def-line-set.E.title Bad char [0xCA] in string at byte 56 Plasmidfinder~~~pRi2659~~~EU186381 Agrobacterium�rhizogenes plasmid pRi2659, completegenome [Agrobacterium rhizogenes] Error: (803.7) [makeblastdb] Blast-def-line-set.E.title Bad char [0xCA] in string at byte 67 Plasmidfinder~~~pRi1724~~~AP002086 Agrobacterium rhizogenes�plasmid pRi1724 DNA, complete sequence [Agrobacterium rhizogenes] Error: (803.7) [makeblastdb] Blast-def-line-set.E.title Bad char [0xCA] in string at byte 56 Plasmidfinder~~~pRi2659~~~EU186381 Agrobacterium�rhizogenes plasmid pRi2659, completegenome [Agrobacterium rhizogenes] Error: (803.7) [makeblastdb] Blast-def-line-set.E.title Bad char [0xCA] in string at byte 67 Plasmidfinder~~~pRi1724~~~AP002086 Agrobacterium rhizogenes�plasmid pRi1724 DNA, complete sequence [Agrobacterium rhizogenes] Adding sequences from FASTA; added 464 sequences in 0.019649 seconds. Is it ok? Or is there something wrong? Cheers, Pablo — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#28 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ANITBGTVNEJFVNB45R5T2YDSKSXIVANCNFSM4RC3T5KQ> .

nmquijada · 2020-10-15T10:41:18Z

Hi @pavlo888

As @biobrad said, it seems that you have some problematic characters that are making makeblastdb to fail.
Try to remove then manually or either try:

sed -i "s/Agrobacterium.*pRi2659/Agrobacterium rhizogenes plasmid pRi2659/" /home/sam/anaconda3/envs/tormes-1.2.0/db/plasmidfinder/sequences
sed -i "s/Agrobacterium.*pRi1724/Agrobacterium rhizogenes plasmid pRi1724/" /home/sam/anaconda3/envs/tormes-1.2.0/db/plasmidfinder/sequences

Additionally, mind the capital characters when writing the database name. In your makeblastdb command, the database is called plasmidfinder whereas in the fasta file the headers are called Plasmidfinder.

Please run this command for solving this issue:
sed -i "s/^Plasmidfinder/plasmidfinder/" /home/sam/anaconda3/envs/tormes-1.2.0/db/plasmidfinder/sequences

Best,
Narciso

pavlo888 · 2020-11-06T22:37:57Z

Hi @nmquijada and @biobrad,

Thanks a lot for your input! Indeed there was a weird character in my sequences. But I deleted it manually and it is now all running smoothly! thank you for your help.

Another quick question. Is there a way to visualize the bootstrap values in the phylogenetic trees obtained as a result? Or is there any other way to check statistically for the topology of the tree?

Cheers,
Pablo

nmquijada · 2020-11-09T08:02:27Z

Hi Pablo,

Yes, there are! However, it requires some manipulation of the tormes_report.Rmd file (or doing it by yourself in R environment) contained in report_files.tgz.
For the visualization of the phylogenetic trees, TORMES uses the ggtree (https://bioconductor.org/packages/release/bioc/html/ggtree.html) and treeio (https://bioconductor.org/packages/release/bioc/html/ggtree.html) R packages and by default, an unrooted tree is generated. However, this tree can be subjected to further modifications in order to change its visualization.

Let's modify the visualization of your pangenome tree based on the presence/absence of accessory genes. If you open the tormes_report.Rmd file and navigate to the "Pangenome analysis" section, you might find something like this:

#### Pangenome tree based on presence/absence of accesory genes representation {.tabset .tabset-fade .tabset-pills}
##### Rectangular (phylogram)
{r, echo=FALSE, message=FALSE, warning=FALSE, fig.height= 15}
tree=read.newick("accessory_binary_genes.fa.newick")
treelim=20*mean(tree$edge.length)
ggtree(tree) + geom_tiplab(size=3) + xlim_tree(treelim)

The last line contains the options to be parsed to the ggtree command to visualize your tree. So, for instance, if you would like to include the tree scale (evolution distance), you can modify that line by:

ggtree(tree) + geom_tiplab(size=3) + xlim_tree(treelim) + geom_treescale()

If you would like to add the branch support values, you can do:

ggtree(tree) + geom_tiplab(size=3) + xlim_tree(treelim) + geom_treescale() + geom_nodelab(aes(x=branch), vjust=-0.3, size=2)

After you modified your tormes_report.Rmd file, you can easily re-run you report with the new changes by using the "render_report.sh" script as explained here: https://github.com/nmquijada/tormes#rendering-customized-reports

There's lot of further customizations possible. I encourage you to go through the ggtree guide: http://yulab-smu.top/treedata-book/index.html

Let me know if you have any question!

I think this is a useful discussion and maybe other users would be interested in doing so too. Thus I would try to upload a small tutorial to the wiki pages.

Best,
Narciso

nmquijada · 2021-06-08T13:45:44Z

Hi Pablo,

We have just released the new TORMES version 1.3.0.
This new version allows the possibility to create custom nucleotide databases by using the --custom_genes_db option.
Please visit the Wiki for further instructions on how to create and use your own databases!

I will close the issue now.
Thanks a lot,
Narciso

shlomobl · 2022-06-16T07:48:43Z

Hi,
Is it possible to use a custom aa database?
Thanks!

nmquijada · 2022-06-17T07:31:37Z

Hi @shlomobl

I just replied here: #56

Cheers!
Narciso

nmquijada added enhancement New feature or request custom databases labels Sep 10, 2020

nmquijada closed this as completed Jun 8, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Virulence genes and plasmids #28

Virulence genes and plasmids #28

pavlo888 commented Sep 9, 2020

nmquijada commented Sep 10, 2020

pavlo888 commented Sep 10, 2020

nmquijada commented Sep 11, 2020

pavlo888 commented Sep 15, 2020 •

edited

Loading

nmquijada commented Sep 16, 2020

pavlo888 commented Sep 22, 2020

nmquijada commented Sep 23, 2020

nmquijada commented Sep 24, 2020

pavlo888 commented Sep 26, 2020

nmquijada commented Sep 28, 2020

pavlo888 commented Oct 13, 2020

biobrad commented Oct 13, 2020 via email

nmquijada commented Oct 15, 2020

pavlo888 commented Nov 6, 2020

nmquijada commented Nov 9, 2020

nmquijada commented Jun 8, 2021

shlomobl commented Jun 16, 2022

nmquijada commented Jun 17, 2022

Virulence genes and plasmids #28

Virulence genes and plasmids #28

Comments

pavlo888 commented Sep 9, 2020

nmquijada commented Sep 10, 2020

pavlo888 commented Sep 10, 2020

nmquijada commented Sep 11, 2020

pavlo888 commented Sep 15, 2020 • edited Loading

nmquijada commented Sep 16, 2020

pavlo888 commented Sep 22, 2020

nmquijada commented Sep 23, 2020

nmquijada commented Sep 24, 2020

pavlo888 commented Sep 26, 2020

nmquijada commented Sep 28, 2020

pavlo888 commented Oct 13, 2020

biobrad commented Oct 13, 2020 via email

nmquijada commented Oct 15, 2020

pavlo888 commented Nov 6, 2020

nmquijada commented Nov 9, 2020

nmquijada commented Jun 8, 2021

shlomobl commented Jun 16, 2022

nmquijada commented Jun 17, 2022

pavlo888 commented Sep 15, 2020 •

edited

Loading