Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow mlflow hooks to be overwriten, and more choice on what to log #442

Open
Joenetics opened this issue Aug 4, 2023 · 1 comment
Open
Labels
documentation Improvements or additions to documentation enhancement New feature or request

Comments

@Joenetics
Copy link

Joenetics commented Aug 4, 2023

Description

I believe more options within the mlflow.yml file would be helpful.
I refer to dicts within params here, but it could apply across the board.

Allow us to choose what to save, where to save, when to save,...
Also related to bug #441

  1. 'Skip/ignore dict' option, e.g,
    #This would ignore all dicts instead of logging them as params
    params:
    dict_params:
    skip: ignore

  2. 'log as'
    #allow us to choose where to dump dicts
    params:
    dict_params:
    log_as: 'artifact'
    path: 'dict_params/'

  3. allow us to overwrite default kedro_mlflow hooks by using hooks.py file

  4. allow users to choose how much to trancate, e.g truncate any params over length 200 to length 20.
    #code within mlflow.yml
    params:
    dict_params:
    flatten: False
    recursive: True
    sep: "."
    long_params_strategy: truncate
    long_params_truncate_at: 200
    long_params_truncate_to: 20

Context

We need versatility, and don't want to clutter the 'params' section of our mlflow experiments with endless dicts/other values.
Or maybe we simply don't want to log them, as they don't change often and can use a timestamp to find out what they were anyway. Why would we want to store them every single run?

Possible Implementation

https://github.com/Galileo-Galilei/kedro-mlflow/blob/master/kedro_mlflow/framework/hooks/mlflow_hook.py
This class could have the options implemented from the mlflow.py with a few more if-else statements

@Galileo-Galilei
Copy link
Owner

Galileo-Galilei commented Aug 22, 2023

Hi @Joenetics, sorry for the long delay. You are raising some very good points, and I'll try to answer them all.

Allow us to choose what to save, where to save, when to save,...

Actually kedro-mlflow is voluntarily quite opinionated. The goal is to log by default what must be logged to ensure reproducibility and not make easy to avoid it. I acknowledge that I should let some flexibility to handle special cases.

Here is what I plan to do :

'Skip/ignore dict' option

As described in #441, if you think a dict should not be logged, it means it does not really contain parameters. Hence, you can avoid logging by converting it to a yaml file and load it directly in the catalog as a dataset:

# data/01_raw/your_dict_parameter.yaml

 key1:
    key11: a
    key12: b
key2:
    key21: 
        key 211: ca
        key 222: cb
    key22: d
#catalog.yaml
your_dict: 
    type: yaml.YAMLDataSet
    filepath: data/01_raw/your_dict_parameter.yaml

Then replace params:your_dict by your_dict everywhere in your pipeline and everything will work exactly the same without logging this file.

Decision: I won't add this key since it is not a best practice and a clean workaround is available. I should document the workaround.

'log as' artifact

Actually there is a more general open question which is "How can i enable ot log any arbitrary artifact which is an input of the pipeline". There are related question on slack and an issue about this #446. In this situation, I could offer a helper to log a dataset in the catalog like:

your_dict: 
    type: kedro_mlflow.io.artifacts.MlflowInputDataSet # does not exists yet, same syntatx as MlflowArtifactDataset
    data_set:
      type: yaml.YAMLDataSet
      filepath: data/01_raw/your_dict_parameter.yaml

Decision: I'll introduce the more general feature

Overwrite default kedro_mlflow hooks to log extra feature

Decision: I need to document this feature.

Allow users to choose truncation length

Decision: I won't fix it myself, but I'll accept PR

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation enhancement New feature or request
Projects
Status: 📋 Backlog
Development

No branches or pull requests

2 participants