Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Base Module Environment is too general #1736

Open
2 tasks done
hallerite opened this issue Mar 8, 2025 · 0 comments
Open
2 tasks done

Base Module Environment is too general #1736

hallerite opened this issue Mar 8, 2025 · 0 comments
Assignees
Labels
enhancement New feature or request P1 Task with middle level priority

Comments

@hallerite
Copy link
Collaborator

Required prerequisites

Motivation

Environment is currently very general and undifferentiated, hence not optimal.

Solution

One reasonable categorization for environments is SingleStep and MultiStep. SingleStep environments sample a question from a dataset and want the LLM to answer. Then, depending on the answer, they supply a reward. These are the ones that we usually use for LLM reasoning, like in Math and Coding. They are special, as step is only called once before the episode ends.

MultiStep environments on the other hand do not end after one step. Round-based games like TicTacToe and Chess are examples for this kind of environment.

I propose to implement 2 abstract classes, SingleStepEnv and MultiStepEnv from which single-step and multi-step environments can inherit. They provide some structure into the environment creation process.

Alternatives

No response

Additional context

No response

@hallerite hallerite added enhancement New feature or request P1 Task with middle level priority labels Mar 8, 2025
@hallerite hallerite self-assigned this Mar 8, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request P1 Task with middle level priority
Projects
None yet
Development

No branches or pull requests

1 participant