Releases: ATOMScience-org/AMPL
1.6.3 Release
- Automated GitHub CI Actions:
- Integrated Ruff Linter to identify potential low-risk errors and code defects.
- Configured Pytest to automatically run AMPL unit and integration tests with every commit or pull request to ensure code validation
- Automated the Docker build and push to DockerHub upon the publication of a release.
- Defined separate Dockerfiles for CPU and GPU configurations.
- Introduced a Makefile to streamline the management of Docker builds, pulls, and Jupyter session execution.
- Pip requirements:
- Introduced a new dev_requirements.txt for development and testing.
- Updated rdkit to version 2024.3.5, which resolves the PandasTools patching error in rdkit_easy and the descriptor calculation issue reported in rdkit/rdkit#7364.
- Use tensorflow-cpu in cpu_requirements.txt to install the appropriate platform package.
- Ruff Linter findings:
- Updated and fixed code based on Ruff reports, addressing common issues such as:
Bug Fixes:
AMPL 1.6.2 release
Bug fixes/Misc changes
- Issue 332 plot_pred_vs_actual not working for multitask models
- Updates to plotting predictions for visualizing uncertainty
- Pinned rdkit 2023.3.3 in pip install to work with rdkit/rdkit#7364 till 2024.3.5 is released on pypi
- Updated pip README to include 'mchip_requirements.txt'
1.6.1 Release
- Created a core tutorial series that represents the end-to-end modeling pipeline to build a machine learning model
- Numerous improvements to visualizations in perf_plots module:
- Modified all plots to use color vision deficiency (CVD) friendly colors
- Added functions to visualize confusion matrices and model performance metrics
- Improved layout of plots produced by plot_perf_vs_epoch and plot_pred_vs_actual and added parameter to control plot size
- Reimplemented plot_prec_recall_curve to produce smoother curves.
- Enhancements to multitask scaffold splitter: faster performance and optimization for response value distribution matching
- Redesigned the AMPL readthedocs for easier end-user navigation.
- Added ability to optimize multitask scaffold split for similarity of response value distributions across split subsets, using Wasserstein distance as dissimilarity metric; controlled by new parameter mtss_response_distr_weight. Improved performance of MTSS code to be much faster.
- Added perf_plots functions plot_confusion_matrices, plot_model_metrics, get_metrics_from_model_pipeline and get_metrics_from_model_file to visualize and provide access to model performance metrics.
- Modified plot_pred_vs_actual_from_file to make the output more consistent with plot_pred_from_actual; changed plot_pred_from_actual so that it accepts either a ModelPipeline or a model file path as its argument.
- Reimplemented plot_prec_recall_curve with sklearn PrecisionRecallDisplay, with better handling of multitask models.
Bug Fixes
- Fixed bug when number of scaffolds < number of superscaffolds requested
- Fixed plot_pred_vs_actual_from_file so that it works on models trained with k-fold CV.
- Fixed to exclude NaNs from % active calculation.
- Minimal pip install requirements
- Provide separate installations for different CPU/GPU runtimes (cpu, cuda, ROCm)
Python compatibility
Python 3.9.x
- Improved the AMPL README with better logic flow and topic grouping.
- Enhanced the API documentation.
- Removed private modules from the API list
- Updated all Python code to PEP 257 / Google docstring convention for consistent formatting
and so that all public modules and functions are included in API documentation.
- Provided Dockerfile for a local AMPL Docker image build.
- Added a parameter to train a model in production mode, where all data are used to train model.
- Added full support for all XGBoost model parameters, including in hyperopt searches.
- Added split_strategy output column to compare_models.get_filesystem_perf_results.
- Added script for patching model tarballs to point to local copy of training data (needed for AD computation).
- Save the class_number parameter for multiclass classification models.
- Added option to map SMILES strings to canonical tautomers in standardization functions rdkit_smiles_from_smiles and base_smiles_from_smiles.
- Added model_file_reader module to simplify extraction of saved model metadata.
- Added function to plot predicted vs actual responses with saved regression models.
- Added module to plot nearest neighbor Tanimoto distance distributions between training and validation/test sets.
- Added module to plot response value distributions for split subsets.
- Updated diversity_plots to allow a user-specified color palette and increase the resolution of the figure
Bug Fixes:
- Made get_featurized_data() check if all the smiles in a dataset are represented in the prefeaturized data
- Fixed bug in setting response column weights to make it consistent across featurizers.
- Fixed error handling in rdkit_easy.mol_to_html to return empty string rather than None.
- Fixed the Tanimoto distance plot to reflect the nearest neighbor distance instead of all distances.
- Fixed freq_table's handling of nans in selected columns
- Fixed bug in EmbeddingFeaturization where descriptors were not transformed before input to embedding model.
AMPL 1.5.1 release
Fixed the readthedocs build issue.
AMPL 1.5.0 release
Updated AMPL to deepchem 2.7.1 and the related libraries
-- Python 3.8.x
-- numpy 1.21.6
-- rdkit 2022.9.3
-- rdkit-pypi 2022.3.5 -
Changed the environment setup from a mixture of conda and pip packages to pip exclusively
-- Updated the related document to reflect the change
-- Removed unused packages from the requirements list -
Feature enhancements/code clean-up
-- Added ability to highlight substructures and SMARTS pattern matches in molecules rendered with rdkit_easy functions mol_to_svg, mol_to_html, etc.
-- Updated to work with minimal examples
-- Changed splitting code to allow many-to-one mapping from compound IDs to SMILES strings
-- Change to support AD index computation for graphconv models using embeddings as features
-- Added max_dataset_rows parameter to limit number of training set records used for AD index computation, so that AD computation is feasible for models trained on large datasets.
-- Replaced all uses of with NumpyDataset to boost performance and reduce creation of temporary files
-- Added workaround for DeepChem issue #1821, which was causing predictions to fail on single-compound batches.
-- Implemented tar archive safe extract to fix vulnerability CVE-2007-4559
-- Turned off uncertainty for multi_class_config_delaney_fit_NN_graphconv.json
-- Refined AMPL version/model version compatibility checking to define groups of compatible versions according to whether the associated DeepChem versions have the same format of model checkpoint files. The current compatibility groups are:
-- Group1: '1.2', '1.3'
-- Group2: '1.4'
-- Group3: '1.5' -
Bug fixes
AMPL 1.4.2 release
- Added the EmbeddingFeaturization class to support transfer learning from NN models.
- Added multitaskscaffold to the list of splitters that require SMILES strings as (temporary) IDs.
- Added basic hyper param plotting functions
- Bug fixes (fixed Multi-task models bug, etc)
- Setup GitHub CI workflow to automate test jobs on push
AMPL 1.4.1 release
- Reverted ipython version from 7.16.3 to 7.16.1 to work with jedi.
- Updated the with two install options for AMPL.
AMPL 1.4.0 release
- Updated AMPL with deepchem 2.6.1 and the related libraries
o Numpy to 1.21.0
o Ipython to 7.16.3
o PyYAML to 5.4
o Tensorflow to 2.8.0
o Switched to Pytorch implementation of full connected neural networks - Added multitaskscaffold split to the pipeline
- Updated plot_tani_dist_distr() and hyperparameter shortlist splitting code to include fingerprint splitter.
- Adds function curate_data.remove_outlier_replicates as part of the standard curation pipeline. Miscellaneous other improvements to data curation functions.
- Removed hard-coded random seed from the code
- Bugs fixes.
- Updated test code
AMPL 1.3.0 release
- Model retrain on datastore
- Added MultitaskScaffoldSplitter for splitting multi task datasets. Not integrated with the pipeline
- Added new models
• Refactoring plans for ModelWrapper and additional features for ParameterParser
• Summarize desired features for new ParameterParser - Moved the ATOM GitHub to
and updated all references to the new github link - Bug fixes
- Updated the documentation
- Fixed, updated all tests under 'unit', 'interactive' directories to work with the libraries used
- Added MD5 dataset hash to model_metadata.json. Use to compare if the models are the same
- New feature for dealing with unbalanced classification datasets
- New module for calculating importance for features or clusters of features in trained models.
- update BSEP example to work with ADI
- Added support for transformers that act on dataset weights only. Added interface to DeepChem BalancingTransformer to deal with imbalanced classification datasets