Workflow checkpoints #73

LouisCarpentier42 · 2025-01-14T08:52:21Z

Currently, the workflow executes all the pipelines, scores the performance, and returns the results. However, if there are many algorithms and many datasets, then this might take a long time.

It would be benificial to do some checkpointing in the workflow, for example every 100 jobs are the results-so-far saved, and then the workflow is continued. How often to be saved can be passed as an argument to the constructor of the Workflow. In this regard, it might also be good to save the results automatically in the workflow (maybe decided on a hyperparameter), instead of only returning the results, to ensure a unified format.

In addition, some methods to decide which jobs to still execute can help to restart the process if some problem occurred. In addition, some methods can be created to obtain all the jobs with an error in the workflow.

The text was updated successfully, but these errors were encountered:

LouisCarpentier42 · 2025-01-22T15:37:52Z

A further improvement of this would be to keep track of some internal state of the workflow (i.e., which jobs have already been executed, which jobs should still be executed?) Then, if something happens while running the workflow (e.g., an interrupt signal is send), the workflow can save the current state. This would allow to continue running the workflow from the current state, without having to redo all passed experiments.

LouisCarpentier42 added the Workflow Improvements regarding the workflow label Jan 14, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Workflow checkpoints #73

Workflow checkpoints #73

LouisCarpentier42 commented Jan 14, 2025

LouisCarpentier42 commented Jan 22, 2025

Workflow checkpoints #73

Workflow checkpoints #73

Comments

LouisCarpentier42 commented Jan 14, 2025

LouisCarpentier42 commented Jan 22, 2025