You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Scenario 1:
Based on my understanding from the documentation, I used the sequence directly preceding my barcode in the forward strand as mask1_front and the sequences directly following my barcode in the forward strand as mask1_rear. I did not set mask2*. Then in my barcode fasta file, I included my four barcodes but only how they would be found in the forward stand not the reverse strand. However, when I do this only ~ 50% of my reads are demuxed and the classified reads are only the reads that match the forward strand structure.
Scenario 2:
If I instead set mask1_front as the reverse complement of the sequence following the barcode in the forward strand (aka the rc of what was used as mask1_rear in the first scenario) and set mask1_rear as the reverse complement of the sequence preceding the barcode in the forward strand (aka the rc of what was used as mask1_front in the first scenario) and include the 4 barcodes as well as each of their reverse complement sequences in the fasta file, I still only get ~ 50% demuxed reads. However, only reverse strand reads are classified, and the reverse complement of the barcodes (BC_02, BC_04, BC_06, BC_08) are what is identified.
This suggests that dorado is not considering both DNA strands, and does not automatically look for the reverse complement of custom barcodes or search for the reversed flanking sequences that would be expected in the complementary strand.
Scenario 3:
If I use the forward sequences described in scenario 1 as mask1_front and mask1_rear and the reverse complement sequences described in the scenario 2 as mask2_front and mask2_rear, I have to add barcode2_pattern to the toml file, and I believe this configuration means dorado will be expecting to find double ended barcodes. Also this produces a bam file with no reads.
Can you please help me understand how to properly design the arrangement.toml and barcode.fasta files to demultiplex both forward and reverse strands with rear end only barcodes?
The text was updated successfully, but these errors were encountered:
This isn't really the intended use-case for rear_only_barcodes. This flag means we expect the barcode to only be at the 3' end (on either strand).
I think you can still make this work if you swap and RC the masks and barcodes in your scenario 3 and remove the rear_only_barcodes line - i.e. treat the 3' end like it is mask2 and the 5' end like it is mask1 - as dorado will look for barcodes at either end as long as you do not use the --barcode-both-ends flag.
Hello,
I am working on demultiplexing using dorado v0.9.1 with custom barcodes. I have rear only barcodes.
Here is my DNA read structure:
Scenario 1:
Based on my understanding from the documentation, I used the sequence directly preceding my barcode in the forward strand as mask1_front and the sequences directly following my barcode in the forward strand as mask1_rear. I did not set mask2*. Then in my barcode fasta file, I included my four barcodes but only how they would be found in the forward stand not the reverse strand. However, when I do this only ~ 50% of my reads are demuxed and the classified reads are only the reads that match the forward strand structure.
Relevant part of arrangement.toml file:
barcodes.fasta file:
Scenario 2:
If I instead set mask1_front as the reverse complement of the sequence following the barcode in the forward strand (aka the rc of what was used as mask1_rear in the first scenario) and set mask1_rear as the reverse complement of the sequence preceding the barcode in the forward strand (aka the rc of what was used as mask1_front in the first scenario) and include the 4 barcodes as well as each of their reverse complement sequences in the fasta file, I still only get ~ 50% demuxed reads. However, only reverse strand reads are classified, and the reverse complement of the barcodes (BC_02, BC_04, BC_06, BC_08) are what is identified.
Relevant part of arrangement.toml file:
barcodes.fasta file:
This suggests that dorado is not considering both DNA strands, and does not automatically look for the reverse complement of custom barcodes or search for the reversed flanking sequences that would be expected in the complementary strand.
Scenario 3:
If I use the forward sequences described in scenario 1 as mask1_front and mask1_rear and the reverse complement sequences described in the scenario 2 as mask2_front and mask2_rear, I have to add barcode2_pattern to the toml file, and I believe this configuration means dorado will be expecting to find double ended barcodes. Also this produces a bam file with no reads.
Relevant part of arrangement.toml file:
barcodes.fasta file:
Can you please help me understand how to properly design the arrangement.toml and barcode.fasta files to demultiplex both forward and reverse strands with rear end only barcodes?
The text was updated successfully, but these errors were encountered: