Releases: ATOMScience-org/AMPL
1.6.3 Release
Highlights
- Automated GitHub CI Actions:
- Integrated Ruff Linter to identify potential low-risk errors and code defects.
- Configured Pytest to automatically run AMPL unit and integration tests with every commit or pull request to ensure code validation
- Automated the Docker build and push to DockerHub upon the publication of a release.
Enhancements:
- Defined separate Dockerfiles for CPU and GPU configurations.
- Introduced a Makefile to streamline the management of Docker builds, pulls, and Jupyter session execution.
- Pip requirements:
- Introduced a new dev_requirements.txt for development and testing.
- Updated rdkit to version 2024.3.5, which resolves the PandasTools patching error in rdkit_easy and the descriptor calculation issue reported in rdkit/rdkit#7364.
- Use tensorflow-cpu in cpu_requirements.txt to install the appropriate platform package.
- Ruff Linter findings:
- Updated and fixed code based on Ruff reports, addressing common issues such as:
Bug Fixes:
AMPL 1.6.2 release
Bug fixes/Misc changes
- Issue 332 plot_pred_vs_actual not working for multitask models
- Updates to plotting predictions for visualizing uncertainty
- Pinned rdkit 2023.3.3 in pip install to work with rdkit/rdkit#7364 till 2024.3.5 is released on pypi
- Updated pip README to include 'mchip_requirements.txt'
1.6.1 Release
Highlights
- Created a core tutorial series that represents the end-to-end modeling pipeline to build a machine learning model
- Numerous improvements to visualizations in perf_plots module:
- Modified all plots to use color vision deficiency (CVD) friendly colors
- Added functions to visualize confusion matrices and model performance metrics
- Improved layout of plots produced by plot_perf_vs_epoch and plot_pred_vs_actual and added parameter to control plot size
- Reimplemented plot_prec_recall_curve to produce smoother curves.
- Enhancements to multitask scaffold splitter: faster performance and optimization for response value distribution matching
- Redesigned the AMPL readthedocs for easier end-user navigation.
Enhancements
- Added ability to optimize multitask scaffold split for similarity of response value distributions across split subsets, using Wasserstein distance as dissimilarity metric; controlled by new parameter mtss_response_distr_weight. Improved performance of MTSS code to be much faster.
- Added perf_plots functions plot_confusion_matrices, plot_model_metrics, get_metrics_from_model_pipeline and get_metrics_from_model_file to visualize and provide access to model performance metrics.
- Modified plot_pred_vs_actual_from_file to make the output more consistent with plot_pred_from_actual; changed plot_pred_from_actual so that it accepts either a ModelPipeline or a model file path as its argument.
- Reimplemented plot_prec_recall_curve with sklearn PrecisionRecallDisplay, with better handling of multitask models.
Bug Fixes
- Fixed bug when number of scaffolds < number of superscaffolds requested
- Fixed plot_pred_vs_actual_from_file so that it works on models trained with k-fold CV.
- Fixed to exclude NaNs from % active calculation.
1.6.0
Highlights
- Minimal pip install requirements
- Provide separate installations for different CPU/GPU runtimes (cpu, cuda, ROCm)
Python compatibility
Python 3.9.x
Documentation
- Improved the AMPL README with better logic flow and topic grouping.
- Enhanced the API documentation.
- Removed private modules from the API list
- Updated all Python code to PEP 257 / Google docstring convention for consistent formatting
and so that all public modules and functions are included in API documentation.
Enhancements:
- Provided Dockerfile for a local AMPL Docker image build.
- Added a parameter to train a model in production mode, where all data are used to train model.
- Added full support for all XGBoost model parameters, including in hyperopt searches.
- Added split_strategy output column to compare_models.get_filesystem_perf_results.
- Added script for patching model tarballs to point to local copy of training data (needed for AD computation).
- Save the class_number parameter for multiclass classification models.
- Added option to map SMILES strings to canonical tautomers in standardization functions rdkit_smiles_from_smiles and base_smiles_from_smiles.
- Added model_file_reader module to simplify extraction of saved model metadata.
- Added function to plot predicted vs actual responses with saved regression models.
- Added module to plot nearest neighbor Tanimoto distance distributions between training and validation/test sets.
- Added module to plot response value distributions for split subsets.
- Updated diversity_plots to allow a user-specified color palette and increase the resolution of the figure
Bug Fixes:
- Made get_featurized_data() check if all the smiles in a dataset are represented in the prefeaturized data
- Fixed bug in setting response column weights to make it consistent across featurizers.
- Fixed error handling in rdkit_easy.mol_to_html to return empty string rather than None.
- Fixed the Tanimoto distance plot to reflect the nearest neighbor distance instead of all distances.
- Fixed freq_table's handling of nans in selected columns
- Fixed bug in setting response column weights to make it consistent across featurizers.
- Fixed error handling in rdkit_easy.mol_to_html to return empty string rather than None.
- Fixed bug in EmbeddingFeaturization where descriptors were not transformed before input to embedding model.
AMPL 1.5.1 release
Fixed the readthedocs build issue.
AMPL 1.5.0 release
-
Updated AMPL to deepchem 2.7.1 and the related libraries
-- Python 3.8.x
-- numpy 1.21.6
-- rdkit 2022.9.3
-- rdkit-pypi 2022.3.5 -
Changed the environment setup from a mixture of conda and pip packages to pip exclusively
-- Updated the related document to reflect the change
-- Removed unused packages from the requirements list -
Feature enhancements/code clean-up
-- Added ability to highlight substructures and SMARTS pattern matches in molecules rendered with rdkit_easy functions mol_to_svg, mol_to_html, etc.
-- Updated hyper_perf_plots.py to work with minimal examples
-- Changed splitting code to allow many-to-one mapping from compound IDs to SMILES strings
-- Change to support AD index computation for graphconv models using embeddings as features
-- Added max_dataset_rows parameter to limit number of training set records used for AD index computation, so that AD computation is feasible for models trained on large datasets.
-- Replaced all uses of deepchem.data.DiskDataset with NumpyDataset to boost performance and reduce creation of temporary files
-- Added workaround for DeepChem issue #1821, which was causing predictions to fail on single-compound batches.
-- Implemented tar archive safe extract to fix vulnerability CVE-2007-4559
-- Turned off uncertainty for multi_class_config_delaney_fit_NN_graphconv.json
-- Refined AMPL version/model version compatibility checking to define groups of compatible versions according to whether the associated DeepChem versions have the same format of model checkpoint files. The current compatibility groups are:
-- Group1: '1.2', '1.3'
-- Group2: '1.4'
-- Group3: '1.5' -
Bug fixes
AMPL 1.4.2 release
- Added the EmbeddingFeaturization class to support transfer learning from NN models.
- Added multitaskscaffold to the list of splitters that require SMILES strings as (temporary) IDs.
- Added basic hyper param plotting functions
- Bug fixes (fixed Multi-task models bug, etc)
- Setup GitHub CI workflow to automate test jobs on push
AMPL 1.4.1 release
- Reverted ipython version from 7.16.3 to 7.16.1 to work with jedi.
- Updated the README.md with two install options for AMPL.
AMPL 1.4.0 release
- Updated AMPL with deepchem 2.6.1 and the related libraries
o Numpy to 1.21.0
o Ipython to 7.16.3
o PyYAML to 5.4
o Tensorflow to 2.8.0
o Switched to Pytorch implementation of full connected neural networks - Added multitaskscaffold split to the pipeline
- Updated plot_tani_dist_distr() and hyperparameter shortlist splitting code to include fingerprint splitter.
- Adds function curate_data.remove_outlier_replicates as part of the standard curation pipeline. Miscellaneous other improvements to data curation functions.
- Removed hard-coded random seed from the code
- Bugs fixes.
- Updated test code
AMPL 1.3.0 release
- Model retrain on datastore
- Added MultitaskScaffoldSplitter for splitting multi task datasets. Not integrated with the pipeline
- Added new models
• Refactoring plans for ModelWrapper and additional features for ParameterParser
• Summarize desired features for new ParameterParser - Moved the ATOM GitHub to
https://github.com/ATOMScience-org
and updated all references to the new github link - Bug fixes
- Updated the documentation
- Fixed, updated all tests under 'unit', 'interactive' directories to work with the libraries used
- Added MD5 dataset hash to model_metadata.json. Use to compare if the models are the same
- New feature for dealing with unbalanced classification datasets
- New module for calculating importance for features or clusters of features in trained models.
- update BSEP example to work with ADI
- Added support for transformers that act on dataset weights only. Added interface to DeepChem BalancingTransformer to deal with imbalanced classification datasets