Skip to content

Commit

Permalink
Merge pull request #100 from chemosim-lab/black-formatting
Browse files Browse the repository at this point in the history
* Reformats the entire codebase with `black` and `isort`
* Refactors some of the reused test objects as fixtures in the conftest file
* Adds the `VdWContact` interaction to the defaults, and changes the `tolerance` parameter to 0
* Fixes issue #89 caused by an empty `fp.ifp` object
  • Loading branch information
cbouy authored Nov 17, 2022
2 parents 1523f99 + 65acfa0 commit 76249b3
Show file tree
Hide file tree
Showing 30 changed files with 1,498 additions and 1,087 deletions.
41 changes: 36 additions & 5 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,9 +7,11 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
## [Unreleased]

## [1.1.0] - 2022-11-XX

### Added
- `Fingerprint.run` now has a `converter_kwargs` parameter that can pass kwargs to the
underlying RDKitConverter from MDAnalysis (Issue #57).
- Formatting with `black`.

### Changed
- The SMARTS for the following groups have been updated to a more accurate definition
Expand All @@ -23,61 +25,78 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
- Cation: include amidine and guanidine,
- Metal ligand: exclude amides and some amines.
- The Pi stacking interactions have been changed for a more accurate implementation
(PR #97).
(PR #97, PR #98).
- The Van der Waals contact has been added to the default interactions, and the `tolerance`
parameter has been set to 0.
- The `pdbqt_supplier` will not add explicit hydrogen atoms anymore, to avoid detecting
hydrogen bonds with "random" hydrogens that weren't in the PDBQT file.
- When using the `pdbqt_supplier`, irrelevant warnings and logs have been disabled.
- Updated the minimal RDKit version to `2021.03.1`
hydrogen bonds with "random" hydrogens that weren't in the PDBQT file (PR #99).
- When using the `pdbqt_supplier`, irrelevant warnings and logs have been disabled (PR #99).
- Updated the minimal RDKit version to `2021.03.1`

### Fixed
- Dead link in the quickstart notebook for the MDAnalysis quickstart (PR #75, @radifar).
- The `pdbqt_supplier` now correctly preserves hydrogens from the input PDBQT file.
- The `pdbqt_supplier` now correctly preserves hydrogens from the input PDBQT file (PR #99).
- If no interaction was detected, `to_dataframe` would error without giving a helpful message. It
now returns a dataframe with the correct number of frames in the index and no column.


## [1.0.0] - 2022-06-07

### Added
- Support for multiprocessing, enabled by default (Issue #46). The number of processes can
be controlled through `n_jobs` in `fp.run` and `fp.run_from_iterable`.
- New interaction: van der Waals contact, based on the sum of vdW radii of two atoms.
- Saving/loading the fingerprint object as a pickle with `fp.to_pickle` and
`Fingerprint.from_pickle` (Issue #40).

### Changed
- Molecule suppliers can now be indexed, reused and can return their length, instead of
being single-use generators.

### Fixed
- ProLIF can now be installed through pip and conda (Issue #6).
- If no interaction is detected in the first frame, `to_dataframe` will not complain about
a `KeyError` anymore (Issue #44).
- When creating a `plf.Fingerprint`, unknown interactions will no longer fail silently.


## [0.3.4] - 2021-09-28

### Added
- Added our J. Cheminformatics article to the citation page of the documentation and the
`CITATION.cff` file.

### Changed
- Improved the documentation on how to properly restrict interactions to ignore the
protein backbone (Issue #22), how to fix the empty dataframe issue when no bond
information is present in the PDB file (Issue #15), how to save the LigNetwork diagram
(Issue #21), and some clarifications on using `fp.generate`

### Fixed
- Mixing residue type with interaction type in the interactive legend of the LigNetwork
would incorrectly display/hide some residues on the canvas (#PR 23)
- MOL2 files starting with a comment (`#`) would lead to an error


## [0.3.3] - 2021-06-11

### Changed
- Custom interactions must return three values: a boolean for the interaction,
and the indices of residue atoms responsible for the interaction

### Fixed
- Custom interactions that only returned a single value instead of three would
raise an uninformative error message


## [0.3.2] - 2021-06-11

### Added
- LigNetwork: an interaction diagram with atomistic details for the ligand and
residue-level details for the protein, fully interactive in a browser/notebook, inspired
from LigPlot (PR #19)
- `fp.generate`: a method to get the IFP between two `prolif.Molecule` objects (PR #19)

### Changed
- Default residue name and number: `UNK` and `0` are now the default values if `None` or
`''` is given
Expand All @@ -87,18 +106,22 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
recalculating the IFP if one wants to display it with atomic details (PR #19)
- Changed the values returned by `fp.bitvector_atoms`: the atom indices have been
separated in two lists, one for the ligand and one for the protein (PR #19)

### Fixed
- Residues with a resnumber of `0` are not converted to `None` anymore (Issue #13)
- Fingerprint instantiated with an unknown interaction name will now raise a `NameError`


## [0.3.1] - 2021-02-02

### Added
- Integration with Zenodo to automatically generate a DOI for new releases
- Citation page
- Docking section in the Quickstart notebook (Issue #11)
- PDBQT, MOL2 and SDF molecule suppliers to make it easier for users to use docking
results as input (Issue #11)
- `Molecule.from_rdkit` classmethod to easily prepare RDKit molecules for ProLIF

### Changed
- The visualisation notebook now displays the protein with py3Dmol. Some examples for
creating and displaying a graph from the interaction dataframe have been added
Expand All @@ -109,25 +132,33 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
- Added the `Fingerprint.run_from_iterable` method, which uses the new supplier functions
to quickly generate a fingerprint.
- Sorted the output of `Fingerprint.list_available`

### Fixed
- `Fingerprint.to_dataframe` is now much faster (Issue #7)
- `ResidueId.from_string` method now supports 1-letter and 2-letter codes for RNA/DNA
(Issue #8)


## [0.3.0] - 2020-12-23

### Added
- Reading input directly from RDKit Mol as well as MDAnalysis AtomGroup objects
- Proper documentation and tests
- CI through GitHub Actions
- Publishing to PyPI triggered by GitHub releases

### Changed
- All the API and the underlying code have been modified
- Repository has been moved from GitHub user @cbouy to organisation @chemosim-lab

### Removed
- Custom MOL2 file reader
- Command-line interface

### Fixed
- Interactions not detected properly


## [0.2.1] - 2019-10-02

Base version for this changelog
24 changes: 16 additions & 8 deletions docs/notebooks/how-to.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -139,6 +139,7 @@
"class Hydrophobic(plf.interactions.Hydrophobic):\n",
" pass\n",
"\n",
"\n",
"fp = plf.Fingerprint([\"Hydrophobic\"])\n",
"fp.hydrophobic(lmol, pmol[\"TYR109.A\"])"
]
Expand Down Expand Up @@ -173,7 +174,8 @@
"class CustomHydrophobic(plf.interactions.Hydrophobic):\n",
" def __init__(self):\n",
" super().__init__(distance=4.0)\n",
" \n",
"\n",
"\n",
"fp = plf.Fingerprint([\"Hydrophobic\", \"CustomHydrophobic\"])\n",
"fp.hydrophobic(lmol, pmol[\"TYR109.A\"])"
]
Expand Down Expand Up @@ -247,6 +249,7 @@
"source": [
"from scipy.spatial import distance_matrix\n",
"\n",
"\n",
"class CloseContact(plf.interactions.Interaction):\n",
" def __init__(self, threshold=2.0):\n",
" self.threshold = threshold\n",
Expand All @@ -260,6 +263,7 @@
" return True, res1_i[0], res2_i[0]\n",
" return False, None, None\n",
"\n",
"\n",
"fp = plf.Fingerprint([\"CloseContact\"])\n",
"fp.closecontact(lmol, pmol[\"ASP129.A\"])"
]
Expand Down Expand Up @@ -293,8 +297,7 @@
"source": [
"ifp = fp.generate(lmol, pmol, return_atoms=True)\n",
"# check the interactino between the ligand and ASP129\n",
"ifp[(plf.ResidueId(\"LIG\", 1, \"G\"),\n",
" plf.ResidueId(\"ASP\", 129, \"A\"))]"
"ifp[(plf.ResidueId(\"LIG\", 1, \"G\"), plf.ResidueId(\"ASP\", 129, \"A\"))]"
]
},
{
Expand Down Expand Up @@ -424,10 +427,13 @@
"df0.columns = df0.columns.droplevel(0)\n",
"df.columns = df.columns.droplevel(0)\n",
"# concatenate and sort columns\n",
"df = (pd.concat([df0, df])\n",
" .fillna(False)\n",
" .sort_index(axis=1, level=0,\n",
" key=lambda index: [plf.ResidueId.from_string(x) for x in index]))\n",
"df = (\n",
" pd.concat([df0, df])\n",
" .fillna(False)\n",
" .sort_index(\n",
" axis=1, level=0, key=lambda index: [plf.ResidueId.from_string(x) for x in index]\n",
" )\n",
")\n",
"df"
]
},
Expand Down Expand Up @@ -593,7 +599,9 @@
"source": [
"from rdkit import Chem\n",
"\n",
"template = Chem.MolFromSmiles(\"C[NH+]1CC(C(=O)NC2(C)OC3(O)C4CCCN4C(=O)C(Cc4ccccc4)N3C2=O)C=C2c3cccc4[nH]cc(c34)CC21\")\n",
"template = Chem.MolFromSmiles(\n",
" \"C[NH+]1CC(C(=O)NC2(C)OC3(O)C4CCCN4C(=O)C(Cc4ccccc4)N3C2=O)C=C2c3cccc4[nH]cc(c34)CC21\"\n",
")\n",
"template"
]
},
Expand Down
19 changes: 14 additions & 5 deletions docs/notebooks/protein-protein_interactions.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -54,7 +54,17 @@
"outputs": [],
"source": [
"# prot-prot interactions\n",
"fp = plf.Fingerprint([\"HBDonor\", \"HBAcceptor\", \"PiStacking\", \"PiCation\", \"CationPi\", \"Anionic\", \"Cationic\"])\n",
"fp = plf.Fingerprint(\n",
" [\n",
" \"HBDonor\",\n",
" \"HBAcceptor\",\n",
" \"PiStacking\",\n",
" \"PiCation\",\n",
" \"CationPi\",\n",
" \"Anionic\",\n",
" \"Cationic\",\n",
" ]\n",
")\n",
"fp.run(u.trajectory[::10], tm3, prot)"
]
},
Expand Down Expand Up @@ -117,10 +127,7 @@
"outputs": [],
"source": [
"# regroup all interactions together and do the same\n",
"g = (df.groupby(level=[\"ligand\", \"protein\"], axis=1)\n",
" .sum()\n",
" .astype(bool)\n",
" .mean())\n",
"g = df.groupby(level=[\"ligand\", \"protein\"], axis=1).sum().astype(bool).mean()\n",
"g.loc[g > 0.3]"
]
},
Expand Down Expand Up @@ -148,12 +155,14 @@
"backbone = Chem.MolFromSmarts(\"[C^2](=O)-[C;X4](-[H])-[N;+0]\")\n",
"fix_h = Chem.MolFromSmarts(\"[H&D0]\")\n",
"\n",
"\n",
"def remove_backbone(atomgroup):\n",
" mol = plf.Molecule.from_mda(atomgroup)\n",
" mol = AllChem.DeleteSubstructs(mol, backbone)\n",
" mol = AllChem.DeleteSubstructs(mol, fix_h)\n",
" return plf.Molecule(mol)\n",
"\n",
"\n",
"# generate IFP\n",
"ifp = []\n",
"for ts in tqdm(u.trajectory[::10]):\n",
Expand Down
51 changes: 35 additions & 16 deletions docs/notebooks/quickstart.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,7 @@
"source": [
"import MDAnalysis as mda\n",
"import prolif as plf\n",
"\n",
"# load trajectory\n",
"u = mda.Universe(plf.datafiles.TOP, plf.datafiles.TRAJ)\n",
"# create selections for the ligand and protein\n",
Expand Down Expand Up @@ -48,12 +49,13 @@
"source": [
"from rdkit import Chem\n",
"from rdkit.Chem import Draw\n",
"\n",
"# create a molecule from the MDAnalysis selection\n",
"lmol = plf.Molecule.from_mda(lig)\n",
"# cleanup before drawing\n",
"mol = Chem.RemoveHs(lmol)\n",
"mol.RemoveAllConformers()\n",
"Draw.MolToImage(mol, size=(400,200))"
"Draw.MolToImage(mol, size=(400, 200))"
]
},
{
Expand All @@ -77,11 +79,13 @@
" mol = Chem.RemoveHs(res)\n",
" mol.RemoveAllConformers()\n",
" frags.append(mol)\n",
"Draw.MolsToGridImage(frags,\n",
" legends=[str(res.resid) for res in pmol], \n",
" subImgSize=(200, 140),\n",
" molsPerRow=4,\n",
" maxMols=prot.n_residues)"
"Draw.MolsToGridImage(\n",
" frags,\n",
" legends=[str(res.resid) for res in pmol],\n",
" subImgSize=(200, 140),\n",
" molsPerRow=4,\n",
" maxMols=prot.n_residues,\n",
")"
]
},
{
Expand Down Expand Up @@ -209,19 +213,36 @@
"\n",
"# reorganize data\n",
"data = df.reset_index()\n",
"data = pd.melt(data, id_vars=[\"Frame\"], var_name=[\"residue\",\"interaction\"])\n",
"data = pd.melt(data, id_vars=[\"Frame\"], var_name=[\"residue\", \"interaction\"])\n",
"data = data[data[\"value\"] != False]\n",
"data.reset_index(inplace=True, drop=True)\n",
"\n",
"# plot\n",
"sns.set_theme(font_scale=.8, style=\"white\", context=\"talk\")\n",
"sns.set_theme(font_scale=0.8, style=\"white\", context=\"talk\")\n",
"g = sns.catplot(\n",
" data=data, x=\"interaction\", y=\"Frame\", hue=\"interaction\", col=\"residue\",\n",
" hue_order=[\"Hydrophobic\", \"HBDonor\", \"HBAcceptor\", \"PiStacking\", \"CationPi\", \"Cationic\"],\n",
" height=3, aspect=0.2, jitter=0, sharex=False, marker=\"_\", s=8, linewidth=3.5,\n",
" data=data,\n",
" x=\"interaction\",\n",
" y=\"Frame\",\n",
" hue=\"interaction\",\n",
" col=\"residue\",\n",
" hue_order=[\n",
" \"Hydrophobic\",\n",
" \"HBDonor\",\n",
" \"HBAcceptor\",\n",
" \"PiStacking\",\n",
" \"CationPi\",\n",
" \"Cationic\",\n",
" ],\n",
" height=3,\n",
" aspect=0.2,\n",
" jitter=0,\n",
" sharex=False,\n",
" marker=\"_\",\n",
" s=8,\n",
" linewidth=3.5,\n",
")\n",
"g.set_titles(\"{col_name}\")\n",
"g.set(xticks=[], ylim=(-.5, data.Frame.max()+1))\n",
"g.set(xticks=[], ylim=(-0.5, data.Frame.max() + 1))\n",
"g.set_xticklabels([])\n",
"g.set_xlabels(\"\")\n",
"g.fig.subplots_adjust(wspace=0)\n",
Expand Down Expand Up @@ -251,10 +272,7 @@
"outputs": [],
"source": [
"# regroup all interactions together and do the same\n",
"g = (df.groupby(level=[\"protein\"], axis=1)\n",
" .sum()\n",
" .astype(bool)\n",
" .mean())\n",
"g = df.groupby(level=[\"protein\"], axis=1).sum().astype(bool).mean()\n",
"g.loc[g > 0.3]"
]
},
Expand All @@ -272,6 +290,7 @@
"outputs": [],
"source": [
"from rdkit import DataStructs\n",
"\n",
"bvs = fp.to_bitvectors()\n",
"tanimoto_sims = DataStructs.BulkTanimotoSimilarity(bvs[0], bvs)\n",
"tanimoto_sims"
Expand Down
Loading

0 comments on commit 76249b3

Please sign in to comment.