This document describes the best practices for managing the projects under "keras-team" on GitHub which use GitHub as the source of truth, including keras-tuner, autokeras, keras-cv, keras-nlp, and maybe more in the future. It covers linting, formating, testing, continuous integration, issues and pull requests tagging, and so on.
The goal of this document is to:
- Improve the overall quality of the projects. The fact that projects all follow the same standard for dev process, which may evolve through time, will ensure the quality from all aspects.
- Unify the external contributing experience. The external open-source contributors may contribute to multiple Keras projects by submitting issues or pull requests. They don't need to learn from different contributing guides.
- Save time for the project leads. They save time by copying and pasting the same setup and by avoiding the listed caveats.
We use pytest for writing tests for the projects, which is the most widely used testing framework for Python in the OSS world. The configuration of pytest is here.
Unit tests should be contained in sibling files, relative to the class or
utility files they are testing. The name of a test file should follow the
pattern of *_test.py
. For example, the tests for
/keras_tuner/engine/hyperparameters.py
are in
/keras_tuner/engine/hyperparameters_tests.py
.
Integration tests may be contained in their own /keras_tuner/integration_tests
directory, as they may require extra files such as data.
While our unit test placement is not suggested in the
good practices of pytest
doc, we recommend this approach to improve the discoverability of the unit
tests for new contributors. This discoverability doubles up as a method of
documentation; when users want to see what util.utility_function()
does, they
can simply open the conveniently located sibling file, util_test.py
.
We use CodeCov to track the test coverage.You may
also refer to
these settings
in setup.cfg
. We will see more about it in the continuous integration section.
Pytest CodeCov supports a wildcard exclude field, which should be set to
include *_test.py
, as to ensure that tests are not included in the code
coverage count.
Fix the random seed for all tests: Link1, Link2, Link3.
Create a temporary path for testing: Link.
For projects based on Keras and TensorFlow, top-level imports are encouraged, like shows in the following example.
import tensorflow as tf
from tensorflow import keras
Exceptions may be acceptable when the module appeared too many times in the code,
like keras.layers
.
We use black, isort, flake8 to lint and format the code. black is to generally format the code. isort is to sort the imports. flake8 is for some additional checks that black doesn't do, like the long lines with a single string. You can see the relevant sections of setup.cfg for the detailed configuration of these tools.
The user does not need to know how to use these tools to lint or format the
code. We provide them with two shell scripts:
/shell/lint.sh
and
/shell/format.sh
.
In these scripts, we also check and add the Apache 2.0 License head to every
file.
The version number of the package is stored only in /package_name/__init__.py
with a single line of __version__ = 'master'
on the master branch.
example
We also need the setup.py
file for the PyPI release.
example
For the setup.py
file to grab the current version number from
/package_name/__init__.py
, we need additional lines in setup.cfg
.
example
For releasing a new version of the package, please following these steps:
- Create a new branch from the master branch.
- Modify the
__version__
value in the new branch. - Create a new release on GitHub. Official tutorial
Note that the continuous integration will upload it to PyPI automatically.
Unit tests are hosted in sibling files relative to the files containing the
code they are testing. SetupTools.find_packages()
supports an
exclude field.
This field should contain *_test.py
to ensure that tests are not packaged
with the release.
We use GitHub Actions for continuous integrations. It automates running tests, checking the code styles, uploading test coverages to CodeCov, and uploading new releases to PyPI.
You can refer to this file for how to set it up. We use a single YAML file for all the GitHub Actions to avoid installing the dependencies multiple times.
To use this setup, you also need to upload your CodeCov and PyPI credentials to the project. Here is the official tutorial.
Make sure you follow the naming of the following secrets for the GitHub Actions YAML file to work.
Name the CodeCov token as CODECOV_TOKEN
.
Name the PyPI username and password as PYPI_USERNAME
and PYPI_PASSWORD
.
We should also test against tf-nightly every day to discover bugs and incompatible issues early and well before the stable release of TensorFlow. The CI setup for it is here.
We will have a common CONTRIBUTING.md in keras-team/governance
to be
distributed to the other repos. This
GitHub Action may be a good
way to sync a centralized contributing guide to different repos.
We should also have
this directory
to support GitHub Codespaces, which is a trend on GitHub. It provides a
web-based IDE to save the contributors from setting up their own dev
environment, which would attract more contributors.
We will have the same issue and pull request
templates
across projects in keras-team
. They will also be stored in
keras-team/governance
and be distributed to the other repos.
Also need to confirm if there is a way to unify the taggings between the repos.