Karobben-Work-Station

Scrips	Functions	中文
Item One	Item Two
NCBI_GSM.PY	I forget what's the function of this one = =
Uniport.py	Annotate your Uniprot ID	Uniport ID 的注释说明
Kegg2Uniport.py	From koID to a UniportID list	把 KO 的 ID 转化成 Uniport ID
Seq2tree.py	Quickest Pipeline to plot a tree from a fasta file with python and R script	超快的fasta文件一键建树画图脚本
vcf2fasta.py
Dem2Homo.py	A py script to turn fly gene (Flybase id) to human ortholog genes
PDBreNumbering.py	A py script to renumbering of resi from all chains in pdb file

NCBI_GSM.py

This Script is designed for crawling GSM ID, GSM2268339 for instants, information form NCBI database.
An example target website: https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSM2268339
You can get the GSM ID, Title, Characteristics, et al, with a GSM list.
But due to the unique information pattern of each GSM-group, try to fit the script by your targets and test it before running it.

NCBI_GSM.py -i list -o result.csv

feel free

Uniprot.py

This script is for annotating the UNPROT ID by usring Python Crawler

Quick Start:

Uniprot.py -i list -o out.table(default)

Example

# Creat a Unprot_ID list
echo -e "Q9VVT4\nQ8T4H6\nQ9VN56\nQ9VLB2\nQ9VFE6\nQ6IG51\nQ9VQS1\nR9PY49\nQ8T4D4\nA0A0B4LGT9\nQ9VHV6\nB7Z003\nA0A0S0WGV8\nP54366\nA0A0B4K6X9\nQ7K0E3\nQ9VAU9\nN0D8I3\nQ9W420\nP52654\nF0JAF9\nQ7KNM2" \
> list
# Run Script
Uniprot.py -i list
# Viewresult
cat out.table

list:


Q8T4D4
Q8T4H6
P54366
Q9VFE6
...

Result:

Uniprot ID	Symbol	Protein	Species	Details
Q9VFE6	CG3817	RRP15-like protein	Drosophila melanogaster (Fruit fly)	Belongs to the RRP15 family.Curated
Q9VAU9	Noa36	Zinc finger protein 330 homolog	Drosophila melanogaster (Fruit fly)	NucleusNucleus By similaritynucleolus By similarityNote: Predominantly expressed in the nucleolus.By similarity
Q8T4D4	Dmel\CG9222	AT03158p	Drosophila melanogaster (Fruit fly)	Belongs to the protein kinase superfamily.UniRule annotationAutomatic assertion according to rulesiRuleBase:RU000304
P54366	Gsc	Homeobox protein goosecoid	Drosophila melanogaster (Fruit fly)	Appears to regulate regional development of specific tissues. Can rescue axis polarity in UV-radiated Xenopus embryos.
...	...	...	...	...

PS: You might lost some entries duto the internet problem. My suggestion is Collecting the entries and run it again.

Kegg2Uniport.py

This script can help you to turn KoID to UniportID example:

/media/ken/Data/Github/Bio_tools/Kegg2Uniprot.py -i 10305

backs to:

K10305	FBXO25_32; F-box protein 25/32	Name
A0A024R9F3	A0A024R9F3_HUMAN	F-box protein 32, isoform CRA_c
A0A074ZPR8	A0A074ZPR8_9TREM	F-box domain protein
A0A088AP91	A0A088AP91_APIME	Uncharacterized protein
A0A094ZG88	A0A094ZG88_SCHHA	F-box domain-containing protein
...

Seq2tree.py

More details: More details: Blog

Rely:

python: Biopython;
R: ggtree

Pipeline:

Biopython use clustalw2 to align fasta file
Trimming the gap from head and tail
Build tree file with trimmed sequences
Visualizing tree through ggtree with R

vcf2fasta.py

This script is designed for extract the fasta from vcf. Based on the reference genome, and the position information from vcf file, base in reference changed into alternative base. More detailed information can be found in Karobben: Extract Fasta from VCF; 2022

python vcf2fasta.py -g Genome.fa -v File.vcf.gz -t Target.csv

For revers the "-" sequences, you can add -d For Connect rest of intervenes, you can add -c

Dme2Homo.py

Dem2Homo.py -i FBgn0039044 FBgn0004647

PDBreNumbering.py

This script is for renumbering the residues in a PDB file. For some PDB structures, parts of the sequences may be missing, or they may use alternative numbering schemes such as Kabat numbering. This can make sequence retrieval or structure truncation more difficult and may confuse some software, such as FoldX. By running this script, it will automatically renumber each chain from 1 to the total number of amino acids it contains.

python PDBreNumbering.py -i input.pdb -o output.pdb

Name		Name	Last commit message	Last commit date
Latest commit History 47 Commits
.ipynb_checkpoints		.ipynb_checkpoints
lib		lib
Dme2Homo.py		Dme2Homo.py
Kegg2Uniprot.py		Kegg2Uniprot.py
LICENSE		LICENSE
NCBI_GSM.py		NCBI_GSM.py
PDBreNumbering.py		PDBreNumbering.py
README.md		README.md
Reference_format_transfer.ipynb		Reference_format_transfer.ipynb
Seq2tree.py		Seq2tree.py
TCGA_gene_mut.py		TCGA_gene_mut.py
Uniport.py		Uniport.py
Untitled.ipynb		Untitled.ipynb
cellpose_npy2tif.py		cellpose_npy2tif.py
czi3MLtif.py		czi3MLtif.py
flybase_orth.py		flybase_orth.py
logo.png		logo.png
ncbi_blast.ipynb		ncbi_blast.ipynb
vcf2fasta.py		vcf2fasta.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Karobben-Work-Station

NCBI_GSM.py

Uniprot.py

Kegg2Uniport.py

Seq2tree.py

vcf2fasta.py

Dme2Homo.py

PDBreNumbering.py

About

Releases

Packages

Languages

License

Karobben/Bio_tools

Folders and files

Latest commit

History

Repository files navigation

Karobben-Work-Station

About

Resources

License

Stars

Watchers

Forks

Languages