Demultiplex both forward and reverse strands with rear end only barcodes #1249

eden528 · 2025-02-11T16:25:45Z

Hello,
I am working on demultiplexing using dorado v0.9.1 with custom barcodes. I have rear only barcodes.

Here is my DNA read structure:

Forward strand:
5' --- adapter --- read --- polyA --- mask1_front --- barcode --- mask1_rear --- 3'

Reverse strand:
5' --- adapter --- rc_mask1_rear --- rc_barcode ---  rc_mask1_front --- polyT --- rc_read --- 3'

Scenario 1:
Based on my understanding from the documentation, I used the sequence directly preceding my barcode in the forward strand as mask1_front and the sequences directly following my barcode in the forward strand as mask1_rear. I did not set mask2*. Then in my barcode fasta file, I included my four barcodes but only how they would be found in the forward stand not the reverse strand. However, when I do this only ~ 50% of my reads are demuxed and the classified reads are only the reads that match the forward strand structure.

Relevant part of arrangement.toml file:

mask1_front = "AAAAAAACCG"
mask1_rear = "CTCTGCGTTG"

barcode1_pattern = "AT_BC%02i"
first_index = 1
last_index = 4
rear_only_barcodes = true

barcodes.fasta file:

>AT_BC01
GTGTTACCGTGGGAATGAATCCTT
>AT_BC02
TTCAGGGAACAAACCAAGTTACGT
>AT_BC03
AACTAGGCACAGCGAGTCTTGGTT
>AT_BC04
AAGCGTTGAAACCTTTGTCCTCTC

Scenario 2:
If I instead set mask1_front as the reverse complement of the sequence following the barcode in the forward strand (aka the rc of what was used as mask1_rear in the first scenario) and set mask1_rear as the reverse complement of the sequence preceding the barcode in the forward strand (aka the rc of what was used as mask1_front in the first scenario) and include the 4 barcodes as well as each of their reverse complement sequences in the fasta file, I still only get ~ 50% demuxed reads. However, only reverse strand reads are classified, and the reverse complement of the barcodes (BC_02, BC_04, BC_06, BC_08) are what is identified.

Relevant part of arrangement.toml file:

mask1_front = "CAACGCAGAG"
mask1_rear = "CGGTTTTTTT"

# Barcode sequences
barcode1_pattern = "AT_BC%02i"
first_index = 1
last_index = 8
rear_only_barcodes = true

barcodes.fasta file:

>AT_BC01
GTGTTACCGTGGGAATGAATCCTT
>AT_BC02
AAGGATTCATTCCCACGGTAACAC
>AT_BC03
TTCAGGGAACAAACCAAGTTACGT
>AT_BC04
ACGTAACTTGGTTTGTTCCCTGAA
>AT_BC05
AACTAGGCACAGCGAGTCTTGGTT
>AT_BC06
AACCAAGACTCGCTGTGCCTAGTT
>AT_BC07
AAGCGTTGAAACCTTTGTCCTCTC
>AT_BC08
GAGAGGACAAAGGTTTCAACGCTT

This suggests that dorado is not considering both DNA strands, and does not automatically look for the reverse complement of custom barcodes or search for the reversed flanking sequences that would be expected in the complementary strand.

Scenario 3:
If I use the forward sequences described in scenario 1 as mask1_front and mask1_rear and the reverse complement sequences described in the scenario 2 as mask2_front and mask2_rear, I have to add barcode2_pattern to the toml file, and I believe this configuration means dorado will be expecting to find double ended barcodes. Also this produces a bam file with no reads.

Relevant part of arrangement.toml file:

mask1_front = "AAAAAAACCG"
mask1_rear = "CTCTGCGTTG"
mask2_front = "CAACGCAGAG"
mask2_rear = "CGGTTTTTTT"

# Barcode sequences
barcode1_pattern = "AT_BC%02i"
barcode2_pattern = "AT_BC%02i"
first_index = 1
last_index = 8
rear_only_barcodes = true

barcodes.fasta file:

>AT_BC01
GTGTTACCGTGGGAATGAATCCTT
>AT_BC02
AAGGATTCATTCCCACGGTAACAC
>AT_BC03
TTCAGGGAACAAACCAAGTTACGT
>AT_BC04
ACGTAACTTGGTTTGTTCCCTGAA
>AT_BC05
AACTAGGCACAGCGAGTCTTGGTT
>AT_BC06
AACCAAGACTCGCTGTGCCTAGTT
>AT_BC07
AAGCGTTGAAACCTTTGTCCTCTC
>AT_BC08
GAGAGGACAAAGGTTTCAACGCTT

Can you please help me understand how to properly design the arrangement.toml and barcode.fasta files to demultiplex both forward and reverse strands with rear end only barcodes?

The text was updated successfully, but these errors were encountered:

malton-ont · 2025-02-12T09:47:14Z

Hi @eden528,

This isn't really the intended use-case for rear_only_barcodes. This flag means we expect the barcode to only be at the 3' end (on either strand).

I think you can still make this work if you swap and RC the masks and barcodes in your scenario 3 and remove the rear_only_barcodes line - i.e. treat the 3' end like it is mask2 and the 5' end like it is mask1 - as dorado will look for barcodes at either end as long as you do not use the --barcode-both-ends flag.

malton-ont added the barcode Issues related to barcoding label Feb 12, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Demultiplex both forward and reverse strands with rear end only barcodes #1249

Demultiplex both forward and reverse strands with rear end only barcodes #1249

eden528 commented Feb 11, 2025

malton-ont commented Feb 12, 2025 •

edited

Loading

Demultiplex both forward and reverse strands with rear end only barcodes #1249

Demultiplex both forward and reverse strands with rear end only barcodes #1249

Comments

eden528 commented Feb 11, 2025

malton-ont commented Feb 12, 2025 • edited Loading

malton-ont commented Feb 12, 2025 •

edited

Loading