How to Add sequence_id Information to mudata['airr'].obsm['airr'] When Reading 10X VDJ Results #581

erlun1 · 2024-12-13T02:44:21Z

Thank you for providing such an excellent module as scirpy. I recently upgraded from version 0.11.2 to 0.20.0 and noticed that scirpy now uses an awkward array to store IR information. This is a great change, and I fully understand the reasoning behind it.

However, when I use scirpy.io.read_10x_vdj to read the filtered_contig_annotations.csv file, I found that the contig_id column is not stored in obsm['airr']. I tried renaming the contig_id column in the file to sequence_id to comply with AIRR standards, but it still wasn't stored correctly.

Upon inspecting the _read_10x_vdj_csv function, I found that only a subset of columns from the filtered_contig_annotations.csv file is written into the awkward matrix. Key columns like contig_id and origin, which are important to me, were discarded.

As a result:

After running ir.pp.index_chains, I cannot map the VJ and VDJ chains back to their specific contigs in the filtered_contig_annotations.csv file.
If the sample names of my GEX and AIRR data differ, I cannot match them due to the absence of the origin column.
How should I address this issue? If I modify the chain_dict.update section in your _read_10x_vdj_csv function and add the sequence_id information, would this satisfy the AIRR format requirements and correctly add the information to adata.obsm['airr'].sequence_id?

Thank you for your time and help!

grst · 2024-12-13T08:18:23Z

Hi,

thanks for opening the issue!
I think mapping contig_id to sequence_id in both read_10x_csv and read_10x_json is the correct thing to do. This would allow you to retreive it from adata.obsm['airr'].sequence_id and also add it to adata.obs using scirpy.get.airr.

Do you want to make a PR?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to Add sequence_id Information to mudata['airr'].obsm['airr'] When Reading 10X VDJ Results #581

How to Add sequence_id Information to mudata['airr'].obsm['airr'] When Reading 10X VDJ Results #581

erlun1 commented Dec 13, 2024

grst commented Dec 13, 2024

How to Add sequence_id Information to mudata['airr'].obsm['airr'] When Reading 10X VDJ Results #581

How to Add sequence_id Information to mudata['airr'].obsm['airr'] When Reading 10X VDJ Results #581

Comments

erlun1 commented Dec 13, 2024

grst commented Dec 13, 2024