HPL-APMS (pHLA)

Hierarchical Progressive Learning for Zero-Shot Peptide-HLA Binding Prediction and Automated Antigenic Peptide Design

Description

Predicting peptide binding to human leukocyte antigen (HLA) alleles is crucial for immune response initiation, essential for therapies like vaccines. While current AI-based computing methods have advanced, they're limited by training data covering less than 1% of known alleles, causing issues with generalization to unseen alleles.

To address this, we propose the Hierarchical Progressive Learning (HPL) framework. With the help of protein pre-trained language models, HPL learns sequence patterns from universal proteins to specific peptide-HLA complexes, improving prediction for unseen alleles by 60.8% (by the way, 1414.0% for non-classical alleles!) compared to TransPHLA model.

Additionally, we develop the Automated Peptide Mutation Search (APMS) program. APMS automatically modifies weak or non-binding peptides' amino acid residues for any target HLA class I allele using the HPL prediction model. It successfully generates high-affinity binder candidates for the target allele in over 38.1% of test cases while adhering to mutation restrictions.

Here is our paper: to be update. This Repo provides a detailed project instruction of our study.

Get Started

Installation

Check env.yml or set up a Python environment with the listed packages:

python==3.8
numpy pandas scikit-learn tqdm
torch==1.7.0
jupyter notebook seaborn matplotlib
tape-proteins==0.5
- TAPE model, a protein pre-trained language model used in our study, details see the source repo
transformers==4.22.1
- [Optional] You can install it if you want to use other pre-trained protein language models, like ProtBERT

Data download

We've uploaded all the data to Google Drive. You can download and store it wherever you prefer. Just remember to update the datapath or data_path variable in the code with the correct path!

View the source and purpose of each downloaded document by clicking here.

raw_data folder:

document	property/purpose	source
`iedb_neg/`	exported IEDB HLA immunopeptidome datasets	IEDB
`hla_prot.fasta`	HLA alleles and the corresponding amino acid sequences
`Pos_E0101.fasta` `Pos_E0103.fasta` `Pos_G0101.fasta` `Pos_G0103.fasta` `Pos_G0104.fasta`	experimentally validated binding peptides of five non-classical HLA alleles, i.e., HLA-E01:01, HLA-E01:03, HLA-G01:01, HLA-G01:03, HLA-G*01:04	HLAncPred Web server
`new_hla_ABC_list.xlsx`	list of HLA alleles mentioned in paper [a large peptidome...]
`mhc_ligand_table_export_1677247855.csv`	binding peptide-HLA pairs published by paper [a large peptidome...], exported from IEDB	IEDB

main_task folder:

document	property/purpose	source
`train_data_fold4.csv` `val_data_fold4.csv`	training dataset, common classical HLA alleles, consistent with TransPHLA	TransPHLA repo
`independent.csv`	testing dataset, common classical HLA alleles, consistent with TransPHLA	TransPHLA repo
`HLA_sequence_dict_ABCEG.csv` (old versions: `hla_seq_dict.csv`, `HLA_sequence_dict_new.csv`, `HLA_sequence_dict_new.csv`)	HLA name and corresponding full/clip/short(pseudo) sequence: common classical, zero-shot classical and zero-shot non-classical HLA alleles
`IEDB_negative_segments.npy`	negative peptides extracted from all possible peptide segments from the exported IEDB HLA immunopeptidome dataset	`./Data_preprocess/build_candidate_pools.ipynb`
`allele2candidate_pools.npy`	possible candidate peptide segments for each common classical HLA allele	`./Data_preprocess/build_candidate_pools.ipynb`
`allele2positive_segs.npy`	all possible peptide segments of positive peptides for each common classical HLA allele	`./Data_preprocess/build_candidate_pools.ipynb`
`zeroshot_set.csv`	zero-shot non-classical dataset	`./Data_preprocess/prepare_EG_peptides.ipynb`
`zeroshot_allele2candidate_pools.npy`	possible candidate peptide segments for each zero-shot non-classical HLA allele	`./Data_preprocess/prepare_EG_peptides.ipynb`
`zeroshot_allele2positive_segs.npy`	all possible peptide segments of positive peptides for each zero-shot non-classical HLA allele	`./Data_preprocess/prepare_EG_peptides.ipynb`
`zeroshot_abc_set.csv`	zero-shot classical dataset	`./Data_preprocess/prepare_new_ABC_data.ipynb`
`zs_new_abc_allele2candidate_pools.npy`	possible candidate peptide segments for each zero-shot classical HLA allele	`./Data_preprocess/prepare_new_ABC_data.ipynb`
`zs_new_abc_allele2positive_segs.npy`	all possible peptide segments of positive peptides for each zero-shot classical HLA allele	`./Data_preprocess/prepare_new_ABC_data.ipynb`
`Supertype_HLA.xls`	supertype category of HLA alleles	paper link

Hints:

We denote the HLA alleles observed in model training as common classical HLA alleles since they are all commonly used in previous studies and classical (i.e., HLA-A/B/C).

We call classical HLA alleles not seen in training as zero-shot classical HLA alleles, which are all from a recent study [a large peptidome...]

Because non-classical HLA alleles (i.e., HLA-E/F/G) are not included in the model training, we refer to them as zero-shot non-classical HLA alleles.

Build Negative peptide pool: see Data_preprocess/build_candidate_pools.ipynb

Usage

HPL

1. HPL-pan

training：enter HPL/jobs/ directory and run finetune1.sh in the commandline
- our official HPL-pan model can be downloaded at Google Drive
- we train using four RTX3090 GPUs with a total batch size of 256.
evaluating: enter HPL/jobs/ directory and run finetune1_eval.sh in the commandline
inference: see HPL/inference_demo.ipynb

2. HPL-Cluster

HPL-Cluster is designed for target HLA allele. Firstly, you should check ./Data_preprocess/allele_cluster.ipynb notebook to obtain a certain amount of clustered HLA alleles.

training：enter HPL/jobs/ directory and run finetune2.sh in the commandline
- HPL-Cluster is based on HPL-pan, so you need to train HPL-pan model first or directly use trained HPL-pan that we provided. Just remember to update the load_path and model_name variable in fine_tune_tape2.py script with the correct path!
evaluating: enter HPL/jobs/ directory and run finetune2_eval.sh in the commandline

3. HPL-Allele

HPL-Allele consists of a group of HPL-Cluster models for a specific target HLA allele. No training is required.

evaluating: enter HPL/jobs/ directory and run ensemble_eval.sh in the commandline
- remember to provide model names of HPL-Cluster models in Evaluation_HPL/evaluation_ft_ensemble.py script

APMS program

Refer to the APMS/mutation_release.py script for the complete algorithm implementation. Additionally, examples are available in the APMS/run_mutation_release.ipynb notebook.

Citation

If you found our code/work useful in your own research, please consider citing the following:

to be update

Contact Us

Free feel to create an issue under this repo or contact to be update [email] if you have any questions!

Acknowledgements

tape-proteins for pre-trained model
TransPHLA for baseline model and common classical dataset
IEDB for data collection
paper [A large peptidome dataset improves HLA class I epitope prediction across most of the human population] for zero-shot classical dataset
HLAncPred Web server / paper [HLAncPred: a method for predicting promiscuous non-classical HLA binding sites] for zero-shot non-classical dataset
paper [Classification of Human Leukocyte Antigen (HLA) Supertypes] for the idea of supertype categorization

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
APMS		APMS
Data_preprocess		Data_preprocess
HPL		HPL
Plot_results		Plot_results
assets		assets
LICENSE		LICENSE
README.md		README.md
env.yml		env.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

HPL-APMS (pHLA)

Description

Get Started

Installation

Data download

Usage

HPL

1. HPL-pan

2. HPL-Cluster

3. HPL-Allele

APMS program

Citation

Contact Us

Acknowledgements

About

Releases

Packages

Languages

License

Jiadong001/HPL-APMS-pHLA

Folders and files

Latest commit

History

Repository files navigation

HPL-APMS (pHLA)

Description

Get Started

Installation

Data download

Usage

HPL

1. HPL-pan

2. HPL-Cluster

3. HPL-Allele

APMS program

Citation

Contact Us

Acknowledgements

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages