From 49f51ef6c759982b13f2da7bba10ef0d8efce37d Mon Sep 17 00:00:00 2001 From: Shivam Singhal <60418185+shivamsinghal001@users.noreply.github.com> Date: Mon, 11 Nov 2024 01:16:21 -0800 Subject: [PATCH] Update README.md for additional chi2 details --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index 989cd71..242a94f 100644 --- a/README.md +++ b/README.md @@ -22,7 +22,7 @@ python -m occupancy_measures.experiments.orpo_experiments with env_to_run=$ENV r ``` python -m occupancy_measures.experiments.orpo_experiments with env_to_run=$ENV reward_fun=proxy exp_algo=ORPO 'om_divergence_coeffs=['$COEFF']' use_action_for_disc 'checkpoint_to_load_policies=["'$BC_CHECKPOINT'"]' checkpoint_to_load_current_policy=$BC_CHECKPOINT seed=$SEED experiment_tag=state 'om_divergence_type=["'$TYPE'"]' ``` -- action distribution regularization: +- action distribution regularization (Note that we set the ```om_divergence_type``` variable to log the OM divergence for these runs): ``` python -m occupancy_measures.experiments.orpo_experiments with env_to_run=$ENV reward_fun=proxy exp_algo=ORPO action_dist_kl_coeff=$COEFF seed=$SEED 'checkpoint_to_load_policies=["'$BC_CHECKPOINT'"]' checkpoint_to_load_current_policy=$BC_CHECKPOINT experiment_tag=AD 'om_divergence_type=["'$TYPE'"]' ```