AffMB

Affinity Maturation of B-cell receptors (AffMB)

Requirements

pandas, numpy, matplotlib, scipy, Bio, logomaker, newworkx, igraph>=0.10.4, pycairo or cairocffi

Installation

After installing the above requirements:

pip3 install -i https://test.pypi.org/simple/ affmb==1.2.1

Input

The input of AffMB is an BCR repertoire annotation file typically generated by running V(D)J annotation tools such as igblast. The minimum set of columns required for AffMB include: barcode, locus, v_call, j_call, c_call, cdr1, cdr2, cdr3, shm. Note the three cdr columns must be nucleotide (nt) sequences. All columns in the input file will be preserved in the output file.

***** The default column names are barcode, locus, v_call, j_call, c_call, cdr1, cdr2, cdr3, shm; if your input columns have other names (e.g., cell_id, chain, v_gene, v_identity, etc.), you can specify the column names in the parameters (see Parameters) *****

***** AffMB assumes all input sequences to be productive and at least the V and C genes are called. Errors will be thrown if content of the v_call, c_call or any cdr columns is missing. The igblast_parser.py contains commands to filter non-productive input. However, if your input is not generated by the script, please remember to do the filtering before running AffMB *****

Preparation of input (strongly recommended)

A typical starting material is the assembled contig sequence file (.fasta format). Here we provide two example contig fasta files under the example folder. The example.filtered_contig.fasta is from single-cell sequencing data, as part of the 10X cellranger pipeline output; the bulk_contig13m.fasta is from bulk sequencing data.

To ensure repeatable output, please follow the instructions below to call igblast (external software to be installed separately) and prepare the input for AffMB by parsing the igblast output using the python script igblast_parser.py we provided. The preparation steps can apply to both single-cell and bulk data.

Instructions on igblast install: https://ncbi.github.io/igblast/cook/How-to-set-up.html
After igblast installation, go to the working directory of igblast which should have the following file structure:

bin
database
internal_data
optional_file

Download the human vdj and c gene databases for igblast that are used in the prepare python script:

# run under igblast working directory
wget https://ftp.ncbi.nih.gov/blast/executables/igblast/release/database/airr/airr_c_human.tar
mkdir -p airr_c_human
tar -xvf airr_c_human.tar -C airr_c_human/

wget https://ftp.ncbi.nih.gov/blast/executables/igblast/release/database/ncbi_human_c_genes.tar
mkdir -p database/
tar -xvf ncbi_human_c_genes.tar -C database/

Export experimental variables and run igblastn:

export PATH=$PATH:/path/to/ncbi-igblast-1.x.x/bin
export IGDATA=/path/to/ncbi-igblast-1.x.x
igblastn -germline_db_V $IGDATA/airr_c_human/airr_c_human_ig.V -germline_db_J $IGDATA/airr_c_human/airr_c_human_ig.J -germline_db_D $IGDATA/airr_c_human/airr_c_human_igh.D -c_region_db $IGDATA/database/ncbi_human_c_genes -organism human -auxiliary_data $IGDATA/optional_file/human_gl.aux -query example.filtered_contig.fasta -ig_seqtype Ig -outfmt 19 > example.filtered_contig.igblast.airr.tsv

igblastn -germline_db_V $IGDATA/airr_c_human/airr_c_human_ig.V -germline_db_J $IGDATA/airr_c_human/airr_c_human_ig.J -germline_db_D $IGDATA/airr_c_human/airr_c_human_igh.D -c_region_db $IGDATA/database/ncbi_human_c_genes -organism human -auxiliary_data $IGDATA/optional_file/human_gl.aux -query bulk_contig13m.fasta -ig_seqtype Ig -outfmt 19 > bulk_contig13m.igblast.airr.tsv

Run the igblast_parser.py to parse igblast output and generate the input annotation file for AffMB:

# python igblast_parser.py <igblast_output> <outfile>
python igblast_parser.py example.filtered_contig.igblast.airr.tsv example.filtered_contig.annotation.csv
python igblast_parser.py bulk_contig13m.igblast.airr.tsv bulk_contig13m.annotation.csv

The igblast_parser.py calculates SHM rate as the fraction of mutated positions between the V region sequence and the inferred germline V region sequence. Alternatively, one may use 'v_identity' to approximate SHM rate (see Parameters).

Quick Start

For single-cell input:

from affmb import affmb
import subprocess

# single-cell (paired-chain) input
outdir = 'test_paired'
subprocess.run('mkdir -p '+outdir, shell=True)
infile = 'example/example.filtered_contig.annotation.csv'
affmb.paired_repertoire_analysis(infile,outdir,depth_filter=2)

#Alternatively, if you want to analyze and visualize only the IGH chain in a single-cell data
#affmb.IGH_repertoire_analysis(infile,outdir,depth_filter=2)

For bulk input:

# bulk (IGH) input
outdir = 'test_bulk'
subprocess.run('mkdir -p '+outdir, shell=True)
infile = 'example/bulk_contig13m.annotation.csv'
affmb.IGH_repertoire_analysis(infile,outdir,depth_filter=2)

Parameters

affmb.paired_repertoire_analysis(infile,outdir,cell_id='barcode',chain='locus',v_gene='v_call',j_gene='j_call',c_gene='c_call',cdr1_nt='cdr1',cdr2_nt='cdr2',cdr3_nt='cdr3',shm='shm',sep=',',depth_filter=2,method='inherent',logo=True,logo_dist_cutoff=0.1,logo_depth_cutoff=2,logo_thres=5,vertex_color='red')
affmb.IGH_repertoire_analysis(infile,outdir,cell_id='barcode',chain='locus',v_gene='v_call',j_gene='j_call',c_gene='c_call',cdr1_nt='cdr1',cdr2_nt='cdr2',cdr3_nt='cdr3',shm='shm',sep=',',depth_filter=2,method='inherent',logo=True,logo_dist_cutoff=0.1,logo_depth_cutoff=2,logo_thres=5,vertex_color='red')

infile: input annotation file name
outdir: output directory name, must be an existing directory
cell_id: the column name for the cell ID (or sequence ID in bulk data) [default: 'barcode']
chain: the column name for the chain type (IGH/IGL/IGK) information [default: 'locus']
v_gene: the column name for the V gene [default: 'v_call']
j_gene: the column name for the J gene [default: 'j_call']
c_gene: the column name for the C gene [default: 'c_call']
cdr1_nt: the column name for the cdr1 nucleotide (nt) sequence [default: 'cdr1']
cdr2_nt: the column name for the cdr2 nucleotide (nt) sequence [default: 'cdr2']
cdr3_nt: the column name for the cdr3 nucleotide (nt) sequence [default: 'cdr3']
shm: the column name for the SHM rate; set shm='v_identity' if you want to approximate SHM rate with 'v_identity' [default: 'shm']
sep: delimiter of the input file [default: ',']
depth_filter: lineage depth threshold under which lineage trees will not be plot, reduces the number of generated tree figures; set depth_filter=0 to stop the filter [default: 2]
method: algorithms for tree construction, choose within {'inherent','prims'} [default: 'inherent']
logo: whether to detect branches of interest and generate logo plots [default: True]
logo_dist_cutoff: determines the largest possible pairwise hamming distance between nodes in a branch of interest [default: 0.1]
logo_depth_cutoff: determines the minimal depth of a branch of interest [default: 2]
logo_thres: determines the minimal size of a branch of interest [default: 5]
vertex_color: the color of the nodes in the lineage tree figures [default: 'red']

Output

The figure below is a graphical summary of AffMB output. The output of running AffMB on the example.filtered_contig.annotation.csv is shown in the test_paired folder. A typical AffMB output directory contains multiple lineage tree figures (one tree per figure) with depth_filter applied (in this example depth_filter=2 is applied) and a csv table file which provides cell-to-node mapping to the lineage tree figures. If branches of interest are detected, logo plots at amino acid and nucleotide levels will also be generated. A summary plot of lineage statistics are also generated for the depth of the lineages, the dominated Ig type of the lineages, the extent of clonal expansion of the lineages, and the lineage size (i.e., number of unique genotypes in lineage).

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
example		example
test_paired/inherent		test_paired/inherent
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AffMB

Requirements

Installation

Input

Preparation of input (strongly recommended)

Quick Start

Parameters

Output

About

Releases

Packages

Languages

License

deepomicslab/AffMB

Folders and files

Latest commit

History

Repository files navigation

AffMB

Requirements

Installation

Input

Preparation of input (strongly recommended)

Quick Start

Parameters

Output

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages