An ensemble-based approach for prediction of protein S-nitrosylation sites integrating supervised word embedding and embedding from protein language model
You can access the webserver of pLMSNOSite at kcdukkalab.org/pLMSNOSite/. This web-based tool allows you to submit your FASTA file containing sequences, and the pLMSNOSite model will process your sequences and provide predictions.
Pratyush, P., Pokharel, S., Saigo, H. et al. pLMSNOSite: an ensemble-based approach for predicting protein S-nitrosylation sites by integrating supervised word embedding and embedding from pre-trained protein language model. BMC Bioinformatics 24, 41 (2023). https://doi.org/10.1186/s12859-023-05164-9
The corresponding BibTeX:
@article{ WOS:000934967300003,
Author = {Pratyush, Pawel and Pokharel, Suresh and Saigo, Hiroto and Kc, Dukka B.},
Title = {pLMSNOSite: an ensemble-based approach for predicting protein
S-nitrosylation sites by integrating supervised word embedding and
embedding from pre-trained protein language model},
Journal = {BMC BIOINFORMATICS},
Year = {2023},
Volume = {24},
Number = {1},
Month = {FEB 8},
DOI = {10.1186/s12859-023-05164-9},
Article-Number = {41},
ISSN = {1471-2105},
ORCID-Numbers = {Pratyush, Pawel/0000-0002-4210-1200},
Unique-ID = {WOS:000934967300003},
}
Pawel Pratyush1, Suresh Pokharel1, Hiroto Saigo2, Dukka B KC1*
1Department of Computer Science, Michigan Technological University, Houghton, MI, USA.
2Department of Electrical Engineering and Computer Science, Kyushu University, 744, Motooka, Nishi-ku, 819-0395, Japan
* Corresponding Author: [email protected]
To get a local copy of the repository, you can either clone it or download it directly from GitHub.
If you have Git installed on your system, you can clone the repository by running the following command in your terminal:
git clone [email protected]:KCLabMTU/pLMSNOSite.git
Alternatively, if you don't have Git or prefer not to use it, you can download the repository directly from GitHub. Click here to download the repository as a zip file.
Note: In the 'Download the Repository' section, the link provided is a direct download link to the repository's main branch as a zip file. This may differ if your repository's default branch is named differently.
Python version: 3.9.7
To install the required libraries, run the following command:
pip install -r requirements.txt
Required libraries and versions:
Bio==1.5.2
keras==2.9.0
matplotlib==3.5.1
numpy==1.23.5
pandas==1.5.0
requests==2.27.1
scikit_learn==1.2.0
seaborn==0.11.2
tensorflow==2.9.1
torch==1.11.0
tqdm==4.63.0
transformers==4.18.0
xgboost==1.5.0
To evaluate our model on the independent test set, we have already placed the test sequences and corresponding ProtT5 features in data/test/
folder. After installing all the requirements, run the following command:
python evaluate_model.py
- Place your FASTA file in the
input/sequence.fasta
directory. - Run the following command:
python predict.py
- Find the results at
output/
folder.
- Find training data at
data/train/
folder - Find all the codes and models related to training at
training_experiments
folder (To be updated).
Should you have any inquiries related to this project, please feel free to reach out via email. Kindly CC all of the following recipients in your communication for a swift response:
- Main Contact: [email protected]
- CC: [email protected]
- CC: [email protected]
We look forward to addressing your queries and concerns.