Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to Add sequence_id Information to mudata['airr'].obsm['airr'] When Reading 10X VDJ Results #581

Open
erlun1 opened this issue Dec 13, 2024 · 1 comment

Comments

@erlun1
Copy link

erlun1 commented Dec 13, 2024

Thank you for providing such an excellent module as scirpy. I recently upgraded from version 0.11.2 to 0.20.0 and noticed that scirpy now uses an awkward array to store IR information. This is a great change, and I fully understand the reasoning behind it.

However, when I use scirpy.io.read_10x_vdj to read the filtered_contig_annotations.csv file, I found that the contig_id column is not stored in obsm['airr']. I tried renaming the contig_id column in the file to sequence_id to comply with AIRR standards, but it still wasn't stored correctly.

Upon inspecting the _read_10x_vdj_csv function, I found that only a subset of columns from the filtered_contig_annotations.csv file is written into the awkward matrix. Key columns like contig_id and origin, which are important to me, were discarded.

As a result:

After running ir.pp.index_chains, I cannot map the VJ and VDJ chains back to their specific contigs in the filtered_contig_annotations.csv file.
If the sample names of my GEX and AIRR data differ, I cannot match them due to the absence of the origin column.
How should I address this issue? If I modify the chain_dict.update section in your _read_10x_vdj_csv function and add the sequence_id information, would this satisfy the AIRR format requirements and correctly add the information to adata.obsm['airr'].sequence_id?

Thank you for your time and help!

@grst
Copy link
Collaborator

grst commented Dec 13, 2024

Hi,

thanks for opening the issue!
I think mapping contig_id to sequence_id in both read_10x_csv and read_10x_json is the correct thing to do. This would allow you to retreive it from adata.obsm['airr'].sequence_id and also add it to adata.obs using scirpy.get.airr.

Do you want to make a PR?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants