Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handling Assemblies in PDB vs PDBX #736

Closed
zachcp opened this issue Feb 12, 2025 · 1 comment
Closed

Handling Assemblies in PDB vs PDBX #736

zachcp opened this issue Feb 12, 2025 · 1 comment
Labels
bug Something isn't working

Comments

@zachcp
Copy link
Contributor

zachcp commented Feb 12, 2025

I've found myself in a bit of a rabbit hole. When looking into bumping biotite (#732, #733), I noticed that the only blocker for upgrading was due to the use of a few functions in 'OldCIF.py'. In fact, there doesn't seem to be ANY current use in the codebase so I think it should probably be removed (WIP: #733). However, I am running issues in to the tests which DO use oldCIF in a few places within a single file: test_assemblies.py

To address that issue, I optimistically updated the code using OldCIF to use the modern pdbx code. However, the new CIFAssemblyParser had a different API from the previos version as I will show below

import molecularnodes.entities.molecule.pdb as pdb
import molecularnodes.entities.molecule.pdbx as pdbx
import biotite.structure.io.pdb as biotite_pdb
import biotite.structure.io.pdbx as biotite_cif

file_pdb = "tests/data/1f2n.pdb"
file_cif = "tests/data/1f2n.cif"

## PDB
pdb_file = biotite_pdb.PDBFile.read(file_pdb)
atoms = biotite_pdb.get_structure(pdb_file, model=1)
ref_assembly = biotite_pdb.get_assembly(pdb_file, model=1)
test_parser = pdb.PDBAssemblyParser(pdb_file)
assembly_id = test_parser.list_assemblies()[0]
test_transformations_pdb = test_parser.get_transformations(assembly_id)


## PDBX
cif_file = biotite_cif.CIFFile().read(file_cif)
atoms = biotite_cif.get_structure(
    # Make sure `label_asym_id` is used instead of `auth_asym_id`
    cif_file,
    model=1,
    use_author_fields=False,
)
ref_assembly = biotite_cif.get_assembly(cif_file, model=1)
test_parser = pdbx.CIFAssemblyParser(cif_file)
assembly_id = test_parser.list_assemblies()[0]
test_transformations_pdbx  = test_parser.get_transformations(assembly_id)
# test_transformations_pdb
# len(test_transformations_pdb) ==  60
...
...
 (['A', 'B', 'C'],
  [[-0.791048, 0.48869, -0.368003, 156.05355],
   [-0.198289, 0.364253, 0.909946, -51.75154],
   [0.578727, 0.792781, -0.191239, 44.70419],
   [0.0, 0.0, 0.0, 1.0]]),
 (['A', 'B', 'C'],
  [[-0.234445, 0.605831, -0.760266, 144.34027],
   [0.538206, 0.732159, 0.417466, -69.17376],
   [0.809549, -0.311307, -0.497713, 50.25899],
   [0.0, 0.0, 0.0, 1.0]])]
# test_transformations_pdbx
# len(test_transformations_pdbx) ==  60
...
....
 {'chain_ids': ['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I'],
  'matrix': [[-0.7910475, 0.48868961, -0.36800314, 156.05355],
   [-0.19828949, 0.36425266, 0.90994575, -51.75154],
   [0.57872716, 0.79278147, -0.19123915, 44.70419],
   [2.1643740543e-314,
    2.1643740543e-314,
    2.1643740543e-314,
    2.1643740543e-314]],
  'pdb_model_num': 0},
 {'chain_ids': ['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I'],
  'matrix': [[-0.23444536, 0.60583125, -0.76026565, 144.34027],
   [0.5382063, 0.7321588, 0.41746554, -69.17376],
   [0.80954886, -0.31130691, -0.49771343, 50.25899],
   [2.1643740543e-314,
    2.1643740543e-314,
    2.1643740543e-314,
    2.1643740543e-314]],
  'pdb_model_num': 0}]

So: I am not an expert in these assemblies but I notice that:

PDB

3 chains.

Image

CIF

3 chains plus chains that are all HETATM / WATER
Image

What to do?

Considering:

  • oldcif.py is not used/imported in the codebase
  • pdbx via biotite seems to be the way of the future
  • the only place old cif is used is in the tests (e.g. but NOT in the actual code )

I think we should:

  • nuke oldcif.py
  • update the tests.

However:

  • we would need to define the correct behavior derived from the discrepancy above.
  • after writing this all, however, I think I know the right temporary solution which is just to shim the test to convert the dict-based solution to an iterated list
@zachcp zachcp added the bug Something isn't working label Feb 12, 2025
@zachcp
Copy link
Contributor Author

zachcp commented Feb 12, 2025

Note: I have only a minimal fix in #732 . It turns out OldCIF.py is used extensively via the load_local function.

@zachcp zachcp closed this as completed Feb 12, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant