Merge pull request #100 from chemosim-lab/black-formatting

* Reformats the entire codebase with `black` and `isort` * Refactors some of the reused test objects as fixtures in the conftest file * Adds the `VdWContact` interaction to the defaults, and changes the `tolerance` parameter to 0 * Fixes issue #89 caused by an empty `fp.ifp` object
chemosim-lab · Nov 17, 2022 · 76249b3 · 76249b3
2 parents 1523f99 + 65acfa0
commit 76249b3
Show file tree

Hide file tree

Showing 30 changed files with 1,498 additions and 1,087 deletions.
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -7,9 +7,11 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
 ## [Unreleased]
 
 ## [1.1.0] - 2022-11-XX
+
 ### Added
 - `Fingerprint.run` now has a `converter_kwargs` parameter that can pass kwargs to the
   underlying RDKitConverter from MDAnalysis (Issue #57).
+- Formatting with `black`.
 
 ### Changed
 - The SMARTS for the following groups have been updated to a more accurate definition
@@ -23,61 +25,78 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
   - Cation: include amidine and guanidine,
   - Metal ligand: exclude amides and some amines.
 - The Pi stacking interactions have been changed for a more accurate implementation
-  (PR #97).
+  (PR #97, PR #98).
+- The Van der Waals contact has been added to the default interactions, and the `tolerance`
+  parameter has been set to 0. 
 - The `pdbqt_supplier` will not add explicit hydrogen atoms anymore, to avoid detecting
-  hydrogen bonds with "random" hydrogens that weren't in the PDBQT file.
-- When using the `pdbqt_supplier`, irrelevant warnings and logs have been disabled.
-- Updated the minimal RDKit version to `2021.03.1` 
+  hydrogen bonds with "random" hydrogens that weren't in the PDBQT file (PR #99).
+- When using the `pdbqt_supplier`, irrelevant warnings and logs have been disabled (PR #99).
+- Updated the minimal RDKit version to `2021.03.1`
 
 ### Fixed
 - Dead link in the quickstart notebook for the MDAnalysis quickstart (PR #75, @radifar).
-- The `pdbqt_supplier` now correctly preserves hydrogens from the input PDBQT file.
+- The `pdbqt_supplier` now correctly preserves hydrogens from the input PDBQT file (PR #99).
+- If no interaction was detected, `to_dataframe` would error without giving a helpful message. It
+  now returns a dataframe with the correct number of frames in the index and no column.
+
 
 ## [1.0.0] - 2022-06-07
+
 ### Added
 - Support for multiprocessing, enabled by default (Issue #46). The number of processes can
   be controlled through `n_jobs` in `fp.run` and `fp.run_from_iterable`.
 - New interaction: van der Waals contact, based on the sum of vdW radii of two atoms.
 - Saving/loading the fingerprint object as a pickle with `fp.to_pickle` and
   `Fingerprint.from_pickle` (Issue #40).
+
 ### Changed
 - Molecule suppliers can now be indexed, reused and can return their length, instead of
   being single-use generators.
+
 ### Fixed
 - ProLIF can now be installed through pip and conda (Issue #6).
 - If no interaction is detected in the first frame, `to_dataframe` will not complain about
   a `KeyError` anymore (Issue #44).
 - When creating a `plf.Fingerprint`, unknown interactions will no longer fail silently.
 
+
 ## [0.3.4] - 2021-09-28
+
 ### Added
 - Added our J. Cheminformatics article to the citation page of the documentation and the
   `CITATION.cff` file. 
+
 ### Changed
 - Improved the documentation on how to properly restrict interactions to ignore the
   protein backbone (Issue #22), how to fix the empty dataframe issue when no bond
   information is present in the PDB file (Issue #15), how to save the LigNetwork diagram
   (Issue #21), and some clarifications on using `fp.generate`
+
 ### Fixed
 - Mixing residue type with interaction type in the interactive legend of the LigNetwork
   would incorrectly display/hide some residues on the canvas (#PR 23)
 - MOL2 files starting with a comment (`#`) would lead to an error
 
+
 ## [0.3.3] - 2021-06-11
+
 ### Changed
 - Custom interactions must return three values: a boolean for the interaction,
   and the indices of residue atoms responsible for the interaction
+
 ### Fixed
 - Custom interactions that only returned a single value instead of three would
   raise an uninformative error message
 
 
 ## [0.3.2] - 2021-06-11
+
 ### Added
 - LigNetwork: an interaction diagram with atomistic details for the ligand and
   residue-level details for the protein, fully interactive in a browser/notebook, inspired
   from LigPlot (PR #19)
 - `fp.generate`: a method to get the IFP between two `prolif.Molecule` objects (PR #19)
+
 ### Changed
 - Default residue name and number: `UNK` and `0` are now the default values if `None` or
   `''` is given
@@ -87,18 +106,22 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
   recalculating the IFP if one wants to display it with atomic details (PR #19)
 - Changed the values returned by `fp.bitvector_atoms`: the atom indices have been
   separated in two lists, one for the ligand and one for the protein (PR #19)
+
 ### Fixed
 - Residues with a resnumber of `0` are not converted to `None` anymore (Issue #13)
 - Fingerprint instantiated with an unknown interaction name will now raise a `NameError`
 
+
 ## [0.3.1] - 2021-02-02
+
 ### Added
 - Integration with Zenodo to automatically generate a DOI for new releases
 - Citation page
 - Docking section in the Quickstart notebook (Issue #11)
 - PDBQT, MOL2 and SDF molecule suppliers to make it easier for users to use docking
   results as input (Issue #11)
 - `Molecule.from_rdkit` classmethod to easily prepare RDKit molecules for ProLIF
+
 ### Changed
 - The visualisation notebook now displays the protein with py3Dmol. Some examples for
   creating and displaying a graph from the interaction dataframe have been added
@@ -109,25 +132,33 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
 - Added the `Fingerprint.run_from_iterable` method, which uses the new supplier functions
   to quickly generate a fingerprint.
 - Sorted the output of `Fingerprint.list_available`
+
 ### Fixed
 - `Fingerprint.to_dataframe` is now much faster (Issue #7)
 - `ResidueId.from_string` method now supports 1-letter and 2-letter codes for RNA/DNA
   (Issue #8)
 
+
 ## [0.3.0] - 2020-12-23
+
 ### Added
 - Reading input directly from RDKit Mol as well as MDAnalysis AtomGroup objects
 - Proper documentation and tests
 - CI through GitHub Actions
 - Publishing to PyPI triggered by GitHub releases
+
 ### Changed
 - All the API and the underlying code have been modified
 - Repository has been moved from GitHub user @cbouy to organisation @chemosim-lab
+
 ### Removed
 - Custom MOL2 file reader
 - Command-line interface
+
 ### Fixed
 - Interactions not detected properly
 
+
 ## [0.2.1] - 2019-10-02
+
 Base version for this changelog
diff --git a/docs/notebooks/how-to.ipynb b/docs/notebooks/how-to.ipynb
@@ -139,6 +139,7 @@
     "class Hydrophobic(plf.interactions.Hydrophobic):\n",
     "    pass\n",
     "\n",
+    "\n",
     "fp = plf.Fingerprint([\"Hydrophobic\"])\n",
     "fp.hydrophobic(lmol, pmol[\"TYR109.A\"])"
    ]
@@ -173,7 +174,8 @@
     "class CustomHydrophobic(plf.interactions.Hydrophobic):\n",
     "    def __init__(self):\n",
     "        super().__init__(distance=4.0)\n",
-    "        \n",
+    "\n",
+    "\n",
     "fp = plf.Fingerprint([\"Hydrophobic\", \"CustomHydrophobic\"])\n",
     "fp.hydrophobic(lmol, pmol[\"TYR109.A\"])"
    ]
@@ -247,6 +249,7 @@
    "source": [
     "from scipy.spatial import distance_matrix\n",
     "\n",
+    "\n",
     "class CloseContact(plf.interactions.Interaction):\n",
     "    def __init__(self, threshold=2.0):\n",
     "        self.threshold = threshold\n",
@@ -260,6 +263,7 @@
     "            return True, res1_i[0], res2_i[0]\n",
     "        return False, None, None\n",
     "\n",
+    "\n",
     "fp = plf.Fingerprint([\"CloseContact\"])\n",
     "fp.closecontact(lmol, pmol[\"ASP129.A\"])"
    ]
@@ -293,8 +297,7 @@
    "source": [
     "ifp = fp.generate(lmol, pmol, return_atoms=True)\n",
     "# check the interactino between the ligand and ASP129\n",
-    "ifp[(plf.ResidueId(\"LIG\", 1, \"G\"),\n",
-    "     plf.ResidueId(\"ASP\", 129, \"A\"))]"
+    "ifp[(plf.ResidueId(\"LIG\", 1, \"G\"), plf.ResidueId(\"ASP\", 129, \"A\"))]"
    ]
   },
   {
@@ -424,10 +427,13 @@
     "df0.columns = df0.columns.droplevel(0)\n",
     "df.columns = df.columns.droplevel(0)\n",
     "# concatenate and sort columns\n",
-    "df = (pd.concat([df0, df])\n",
-    "        .fillna(False)\n",
-    "        .sort_index(axis=1, level=0,\n",
-    "                    key=lambda index: [plf.ResidueId.from_string(x) for x in index]))\n",
+    "df = (\n",
+    "    pd.concat([df0, df])\n",
+    "    .fillna(False)\n",
+    "    .sort_index(\n",
+    "        axis=1, level=0, key=lambda index: [plf.ResidueId.from_string(x) for x in index]\n",
+    "    )\n",
+    ")\n",
     "df"
    ]
   },
@@ -593,7 +599,9 @@
    "source": [
     "from rdkit import Chem\n",
     "\n",
-    "template = Chem.MolFromSmiles(\"C[NH+]1CC(C(=O)NC2(C)OC3(O)C4CCCN4C(=O)C(Cc4ccccc4)N3C2=O)C=C2c3cccc4[nH]cc(c34)CC21\")\n",
+    "template = Chem.MolFromSmiles(\n",
+    "    \"C[NH+]1CC(C(=O)NC2(C)OC3(O)C4CCCN4C(=O)C(Cc4ccccc4)N3C2=O)C=C2c3cccc4[nH]cc(c34)CC21\"\n",
+    ")\n",
     "template"
    ]
   },

diff --git a/docs/notebooks/protein-protein_interactions.ipynb b/docs/notebooks/protein-protein_interactions.ipynb
@@ -54,7 +54,17 @@
    "outputs": [],
    "source": [
     "# prot-prot interactions\n",
-    "fp = plf.Fingerprint([\"HBDonor\", \"HBAcceptor\", \"PiStacking\", \"PiCation\", \"CationPi\", \"Anionic\", \"Cationic\"])\n",
+    "fp = plf.Fingerprint(\n",
+    "    [\n",
+    "        \"HBDonor\",\n",
+    "        \"HBAcceptor\",\n",
+    "        \"PiStacking\",\n",
+    "        \"PiCation\",\n",
+    "        \"CationPi\",\n",
+    "        \"Anionic\",\n",
+    "        \"Cationic\",\n",
+    "    ]\n",
+    ")\n",
     "fp.run(u.trajectory[::10], tm3, prot)"
    ]
   },
@@ -117,10 +127,7 @@
    "outputs": [],
    "source": [
     "# regroup all interactions together and do the same\n",
-    "g = (df.groupby(level=[\"ligand\", \"protein\"], axis=1)\n",
-    "       .sum()\n",
-    "       .astype(bool)\n",
-    "       .mean())\n",
+    "g = df.groupby(level=[\"ligand\", \"protein\"], axis=1).sum().astype(bool).mean()\n",
     "g.loc[g > 0.3]"
    ]
   },
@@ -148,12 +155,14 @@
     "backbone = Chem.MolFromSmarts(\"[C^2](=O)-[C;X4](-[H])-[N;+0]\")\n",
     "fix_h = Chem.MolFromSmarts(\"[H&D0]\")\n",
     "\n",
+    "\n",
     "def remove_backbone(atomgroup):\n",
     "    mol = plf.Molecule.from_mda(atomgroup)\n",
     "    mol = AllChem.DeleteSubstructs(mol, backbone)\n",
     "    mol = AllChem.DeleteSubstructs(mol, fix_h)\n",
     "    return plf.Molecule(mol)\n",
     "\n",
+    "\n",
     "# generate IFP\n",
     "ifp = []\n",
     "for ts in tqdm(u.trajectory[::10]):\n",

diff --git a/docs/notebooks/quickstart.ipynb b/docs/notebooks/quickstart.ipynb
@@ -19,6 +19,7 @@
    "source": [
     "import MDAnalysis as mda\n",
     "import prolif as plf\n",
+    "\n",
     "# load trajectory\n",
     "u = mda.Universe(plf.datafiles.TOP, plf.datafiles.TRAJ)\n",
     "# create selections for the ligand and protein\n",
@@ -48,12 +49,13 @@
    "source": [
     "from rdkit import Chem\n",
     "from rdkit.Chem import Draw\n",
+    "\n",
     "# create a molecule from the MDAnalysis selection\n",
     "lmol = plf.Molecule.from_mda(lig)\n",
     "# cleanup before drawing\n",
     "mol = Chem.RemoveHs(lmol)\n",
     "mol.RemoveAllConformers()\n",
-    "Draw.MolToImage(mol, size=(400,200))"
+    "Draw.MolToImage(mol, size=(400, 200))"
    ]
   },
   {
@@ -77,11 +79,13 @@
     "    mol = Chem.RemoveHs(res)\n",
     "    mol.RemoveAllConformers()\n",
     "    frags.append(mol)\n",
-    "Draw.MolsToGridImage(frags,\n",
-    "                     legends=[str(res.resid) for res in pmol], \n",
-    "                     subImgSize=(200, 140),\n",
-    "                     molsPerRow=4,\n",
-    "                     maxMols=prot.n_residues)"
+    "Draw.MolsToGridImage(\n",
+    "    frags,\n",
+    "    legends=[str(res.resid) for res in pmol],\n",
+    "    subImgSize=(200, 140),\n",
+    "    molsPerRow=4,\n",
+    "    maxMols=prot.n_residues,\n",
+    ")"
    ]
   },
   {
@@ -209,19 +213,36 @@
     "\n",
     "# reorganize data\n",
     "data = df.reset_index()\n",
-    "data = pd.melt(data, id_vars=[\"Frame\"], var_name=[\"residue\",\"interaction\"])\n",
+    "data = pd.melt(data, id_vars=[\"Frame\"], var_name=[\"residue\", \"interaction\"])\n",
     "data = data[data[\"value\"] != False]\n",
     "data.reset_index(inplace=True, drop=True)\n",
     "\n",
     "# plot\n",
-    "sns.set_theme(font_scale=.8, style=\"white\", context=\"talk\")\n",
+    "sns.set_theme(font_scale=0.8, style=\"white\", context=\"talk\")\n",
     "g = sns.catplot(\n",
-    "    data=data, x=\"interaction\", y=\"Frame\", hue=\"interaction\", col=\"residue\",\n",
-    "    hue_order=[\"Hydrophobic\", \"HBDonor\", \"HBAcceptor\", \"PiStacking\", \"CationPi\", \"Cationic\"],\n",
-    "    height=3, aspect=0.2, jitter=0, sharex=False, marker=\"_\", s=8, linewidth=3.5,\n",
+    "    data=data,\n",
+    "    x=\"interaction\",\n",
+    "    y=\"Frame\",\n",
+    "    hue=\"interaction\",\n",
+    "    col=\"residue\",\n",
+    "    hue_order=[\n",
+    "        \"Hydrophobic\",\n",
+    "        \"HBDonor\",\n",
+    "        \"HBAcceptor\",\n",
+    "        \"PiStacking\",\n",
+    "        \"CationPi\",\n",
+    "        \"Cationic\",\n",
+    "    ],\n",
+    "    height=3,\n",
+    "    aspect=0.2,\n",
+    "    jitter=0,\n",
+    "    sharex=False,\n",
+    "    marker=\"_\",\n",
+    "    s=8,\n",
+    "    linewidth=3.5,\n",
     ")\n",
     "g.set_titles(\"{col_name}\")\n",
-    "g.set(xticks=[], ylim=(-.5, data.Frame.max()+1))\n",
+    "g.set(xticks=[], ylim=(-0.5, data.Frame.max() + 1))\n",
     "g.set_xticklabels([])\n",
     "g.set_xlabels(\"\")\n",
     "g.fig.subplots_adjust(wspace=0)\n",
@@ -251,10 +272,7 @@
    "outputs": [],
    "source": [
     "# regroup all interactions together and do the same\n",
-    "g = (df.groupby(level=[\"protein\"], axis=1)\n",
-    "       .sum()\n",
-    "       .astype(bool)\n",
-    "       .mean())\n",
+    "g = df.groupby(level=[\"protein\"], axis=1).sum().astype(bool).mean()\n",
     "g.loc[g > 0.3]"
    ]
   },
@@ -272,6 +290,7 @@
    "outputs": [],
    "source": [
     "from rdkit import DataStructs\n",
+    "\n",
     "bvs = fp.to_bitvectors()\n",
     "tanimoto_sims = DataStructs.BulkTanimotoSimilarity(bvs[0], bvs)\n",
     "tanimoto_sims"