Skip to content

Releases: ATOMScience-org/AMPL

1.6.3 Release

06 Sep 19:53
7d4ae2a
Compare
Choose a tag to compare

Highlights

  • Automated GitHub CI Actions:
    • Integrated Ruff Linter to identify potential low-risk errors and code defects.
    • Configured Pytest to automatically run AMPL unit and integration tests with every commit or pull request to ensure code validation
    • Automated the Docker build and push to DockerHub upon the publication of a release.

Enhancements:

ruff_linter_ampl

Bug Fixes:

  • Updated code according to Linter reports (commits: bc34dcf, 86f826e, 32d8b9d)
  • Handled cases where the split subset is empty (commit: a001100)

AMPL 1.6.2 release

01 Aug 22:19
c509945
Compare
Choose a tag to compare

Bug fixes/Misc changes

  • Issue 332 plot_pred_vs_actual not working for multitask models
  • Updates to plotting predictions for visualizing uncertainty
  • Pinned rdkit 2023.3.3 in pip install to work with rdkit/rdkit#7364 till 2024.3.5 is released on pypi
  • Updated pip README to include 'mchip_requirements.txt'

1.6.1 Release

28 Jun 00:25
fb86eea
Compare
Choose a tag to compare

Highlights

  • Created a core tutorial series that represents the end-to-end modeling pipeline to build a machine learning model
  • Numerous improvements to visualizations in perf_plots module:
    • Modified all plots to use color vision deficiency (CVD) friendly colors
    • Added functions to visualize confusion matrices and model performance metrics
    • Improved layout of plots produced by plot_perf_vs_epoch and plot_pred_vs_actual and added parameter to control plot size
    • Reimplemented plot_prec_recall_curve to produce smoother curves.
  • Enhancements to multitask scaffold splitter: faster performance and optimization for response value distribution matching
  • Redesigned the AMPL readthedocs for easier end-user navigation.

Enhancements

  • Added ability to optimize multitask scaffold split for similarity of response value distributions across split subsets, using Wasserstein distance as dissimilarity metric; controlled by new parameter mtss_response_distr_weight. Improved performance of MTSS code to be much faster.
  • Added perf_plots functions plot_confusion_matrices, plot_model_metrics, get_metrics_from_model_pipeline and get_metrics_from_model_file to visualize and provide access to model performance metrics.
  • Modified plot_pred_vs_actual_from_file to make the output more consistent with plot_pred_from_actual; changed plot_pred_from_actual so that it accepts either a ModelPipeline or a model file path as its argument.
  • Reimplemented plot_prec_recall_curve with sklearn PrecisionRecallDisplay, with better handling of multitask models.

Bug Fixes

  • Fixed bug when number of scaffolds < number of superscaffolds requested
  • Fixed plot_pred_vs_actual_from_file so that it works on models trained with k-fold CV.
  • Fixed to exclude NaNs from % active calculation.

1.6.0

23 Feb 16:13
a319ccf
Compare
Choose a tag to compare

Highlights

  • Minimal pip install requirements
    • Provide separate installations for different CPU/GPU runtimes (cpu, cuda, ROCm)

Python compatibility

Python 3.9.x

Documentation

  • Improved the AMPL README with better logic flow and topic grouping.
  • Enhanced the API documentation.
    • Removed private modules from the API list
    • Updated all Python code to PEP 257 / Google docstring convention for consistent formatting
      and so that all public modules and functions are included in API documentation.

Enhancements:

  • Provided Dockerfile for a local AMPL Docker image build.
  • Added a parameter to train a model in production mode, where all data are used to train model.
  • Added full support for all XGBoost model parameters, including in hyperopt searches.
  • Added split_strategy output column to compare_models.get_filesystem_perf_results.
  • Added script for patching model tarballs to point to local copy of training data (needed for AD computation).
  • Save the class_number parameter for multiclass classification models.
  • Added option to map SMILES strings to canonical tautomers in standardization functions rdkit_smiles_from_smiles and base_smiles_from_smiles.
  • Added model_file_reader module to simplify extraction of saved model metadata.
  • Added function to plot predicted vs actual responses with saved regression models.
  • Added module to plot nearest neighbor Tanimoto distance distributions between training and validation/test sets.
  • Added module to plot response value distributions for split subsets.
  • Updated diversity_plots to allow a user-specified color palette and increase the resolution of the figure

Bug Fixes:

  • Made get_featurized_data() check if all the smiles in a dataset are represented in the prefeaturized data
  • Fixed bug in setting response column weights to make it consistent across featurizers.
  • Fixed error handling in rdkit_easy.mol_to_html to return empty string rather than None.
  • Fixed the Tanimoto distance plot to reflect the nearest neighbor distance instead of all distances.
  • Fixed freq_table's handling of nans in selected columns
  • Fixed bug in setting response column weights to make it consistent across featurizers.
  • Fixed error handling in rdkit_easy.mol_to_html to return empty string rather than None.
  • Fixed bug in EmbeddingFeaturization where descriptors were not transformed before input to embedding model.

AMPL 1.5.1 release

01 Mar 17:36
dc8f9df
Compare
Choose a tag to compare

Fixed the readthedocs build issue.

AMPL 1.5.0 release

01 Mar 00:43
57f2220
Compare
Choose a tag to compare
  • Updated AMPL to deepchem 2.7.1 and the related libraries
    -- Python 3.8.x
    -- numpy 1.21.6
    -- rdkit 2022.9.3
    -- rdkit-pypi 2022.3.5

  • Changed the environment setup from a mixture of conda and pip packages to pip exclusively
    -- Updated the related document to reflect the change
    -- Removed unused packages from the requirements list

  • Feature enhancements/code clean-up
    -- Added ability to highlight substructures and SMARTS pattern matches in molecules rendered with rdkit_easy functions mol_to_svg, mol_to_html, etc.
    -- Updated hyper_perf_plots.py to work with minimal examples
    -- Changed splitting code to allow many-to-one mapping from compound IDs to SMILES strings
    -- Change to support AD index computation for graphconv models using embeddings as features
    -- Added max_dataset_rows parameter to limit number of training set records used for AD index computation, so that AD computation is feasible for models trained on large datasets.
    -- Replaced all uses of deepchem.data.DiskDataset with NumpyDataset to boost performance and reduce creation of temporary files
    -- Added workaround for DeepChem issue #1821, which was causing predictions to fail on single-compound batches.
    -- Implemented tar archive safe extract to fix vulnerability CVE-2007-4559
    -- Turned off uncertainty for multi_class_config_delaney_fit_NN_graphconv.json
    -- Refined AMPL version/model version compatibility checking to define groups of compatible versions according to whether the associated DeepChem versions have the same format of model checkpoint files. The current compatibility groups are:
    -- Group1: '1.2', '1.3'
    -- Group2: '1.4'
    -- Group3: '1.5'

  • Bug fixes

AMPL 1.4.2 release

22 Aug 22:50
Compare
Choose a tag to compare
  • Added the EmbeddingFeaturization class to support transfer learning from NN models.
  • Added multitaskscaffold to the list of splitters that require SMILES strings as (temporary) IDs.
  • Added basic hyper param plotting functions
  • Bug fixes (fixed Multi-task models bug, etc)
  • Setup GitHub CI workflow to automate test jobs on push

AMPL 1.4.1 release

16 Jun 21:00
f67b716
Compare
Choose a tag to compare
  • Reverted ipython version from 7.16.3 to 7.16.1 to work with jedi.
  • Updated the README.md with two install options for AMPL.

AMPL 1.4.0 release

15 Jun 22:18
2ff4e51
Compare
Choose a tag to compare
  • Updated AMPL with deepchem 2.6.1 and the related libraries
    o Numpy to 1.21.0
    o Ipython to 7.16.3
    o PyYAML to 5.4
    o Tensorflow to 2.8.0
    o Switched to Pytorch implementation of full connected neural networks
  • Added multitaskscaffold split to the pipeline
  • Updated plot_tani_dist_distr() and hyperparameter shortlist splitting code to include fingerprint splitter.
  • Adds function curate_data.remove_outlier_replicates as part of the standard curation pipeline. Miscellaneous other improvements to data curation functions.
  • Removed hard-coded random seed from the code
  • Bugs fixes.
  • Updated test code

AMPL 1.3.0 release

20 Jan 23:41
Compare
Choose a tag to compare
  • Model retrain on datastore
  • Added MultitaskScaffoldSplitter for splitting multi task datasets. Not integrated with the pipeline
  • Added new models
    • Refactoring plans for ModelWrapper and additional features for ParameterParser
    • Summarize desired features for new ParameterParser
  • Moved the ATOM GitHub to https://github.com/ATOMScience-org and updated all references to the new github link
  • Bug fixes
  • Updated the documentation
  • Fixed, updated all tests under 'unit', 'interactive' directories to work with the libraries used
  • Added MD5 dataset hash to model_metadata.json. Use to compare if the models are the same
  • New feature for dealing with unbalanced classification datasets
  • New module for calculating importance for features or clusters of features in trained models.
  • update BSEP example to work with ADI
  • Added support for transformers that act on dataset weights only. Added interface to DeepChem BalancingTransformer to deal with imbalanced classification datasets