-
Notifications
You must be signed in to change notification settings - Fork 34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow mlflow hooks to be overwriten, and more choice on what to log #442
Comments
Hi @Joenetics, sorry for the long delay. You are raising some very good points, and I'll try to answer them all.
Actually Here is what I plan to do : 'Skip/ignore dict' optionAs described in #441, if you think a dict should not be logged, it means it does not really contain parameters. Hence, you can avoid logging by converting it to a yaml file and load it directly in the catalog as a dataset: # data/01_raw/your_dict_parameter.yaml
key1:
key11: a
key12: b
key2:
key21:
key 211: ca
key 222: cb
key22: d #catalog.yaml
your_dict:
type: yaml.YAMLDataSet
filepath: data/01_raw/your_dict_parameter.yaml Then replace Decision: I won't add this key since it is not a best practice and a clean workaround is available. I should document the workaround. 'log as' artifactActually there is a more general open question which is "How can i enable ot log any arbitrary artifact which is an input of the pipeline". There are related question on slack and an issue about this #446. In this situation, I could offer a helper to log a dataset in the catalog like: your_dict:
type: kedro_mlflow.io.artifacts.MlflowInputDataSet # does not exists yet, same syntatx as MlflowArtifactDataset
data_set:
type: yaml.YAMLDataSet
filepath: data/01_raw/your_dict_parameter.yaml Decision: I'll introduce the more general feature Overwrite default
|
Description
I believe more options within the mlflow.yml file would be helpful.
I refer to dicts within params here, but it could apply across the board.
Allow us to choose what to save, where to save, when to save,...
Also related to bug #441
'Skip/ignore dict' option, e.g,
#This would ignore all dicts instead of logging them as params
params:
dict_params:
skip: ignore
'log as'
#allow us to choose where to dump dicts
params:
dict_params:
log_as: 'artifact'
path: 'dict_params/'
allow us to overwrite default kedro_mlflow hooks by using hooks.py file
allow users to choose how much to trancate, e.g truncate any params over length 200 to length 20.
#code within mlflow.yml
params:
dict_params:
flatten: False
recursive: True
sep: "."
long_params_strategy: truncate
long_params_truncate_at: 200
long_params_truncate_to: 20
Context
We need versatility, and don't want to clutter the 'params' section of our mlflow experiments with endless dicts/other values.
Or maybe we simply don't want to log them, as they don't change often and can use a timestamp to find out what they were anyway. Why would we want to store them every single run?
Possible Implementation
https://github.com/Galileo-Galilei/kedro-mlflow/blob/master/kedro_mlflow/framework/hooks/mlflow_hook.py
This class could have the options implemented from the mlflow.py with a few more if-else statements
The text was updated successfully, but these errors were encountered: