-
Notifications
You must be signed in to change notification settings - Fork 271
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor Eval Suite #23
Comments
Makes sense! One example that does this well is lm harness, where many of prompt formatting / string post-processing rules are in yamls. For example, GSM8K: https://github.com/EleutherAI/lm-evaluation-harness/blob/main/lm_eval/tasks/gsm8k/gsm8k.yaml |
Yes, I was about to raise an issue: currently when users add new models, they need to hard-code the path to match the model name in model_utils.py. @SumanthRH |
Another refactor PR on top of #23 now focused on model-specific configurations and data generation. - Model-specific system prompts, user templates etc are best left to be in the a YAML file. - TaskHandler should be model agnostic, since we want to have a consistent evaluation logic for all tasks - Data curation scripts for different Sky-T1 models should live outside the `skythought_evals` package. These are mostly scripts focused on a particular data curation task like filtering, rewriting etc. My proposal is to place common scripts in `scripts/ `. A guide for obtaining the final training data + training commands for different Sky-T1 models should be placed in `recipes/` . For now, all data curation scripts are in the `scripts` folder . - Adds a new `system-prompt-template` CLI flag. User can leverage available templates like those for sky-T1, Qwen, etc for a different model during evaluation.
Currently, we use a single
task_handler.py
to implement task handlers for each task. It will become longer and longer when we add more tasks. I am proposing the following refactors:TASK_NAME.py
that includes the task handler).Can start the refactor after #19 , #20 and #21 are merged.
This is now a draft proposal, will add more later. Feel free to discuss here.
@erictang000 @SumanthRH @richardliaw @kouroshHakha
The text was updated successfully, but these errors were encountered: