Change BanditDuality to use a prior and softmax to randomize arms #779

odow · 2024-09-04T22:56:46Z

x-ref #777

@Thuener do you want to try this branch? (I haven't actually run it locally. Just edited without testing.)

Hopefully this PR:

Initializes with a default prior so that we do't have the 10 / constant bound issue
Randomizes the arm selection according to softmax, so to arms with similar scores will be randomly chosen, decaying quite quickly if the scores are differennt
Adds a max(, 0.1) to the time denominator to hedge against iterations that take a very short amount of time and increase the reward by too much.

codecov · 2024-09-05T00:02:37Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 93.40%. Comparing base (d2495d6) to head (3f871cd).
Report is 1 commits behind head on master.

Additional details and impacted files

@@           Coverage Diff           @@
##           master     #779   +/-   ##
=======================================
  Coverage   93.40%   93.40%           
=======================================
  Files          27       27           
  Lines        3486     3504   +18     
=======================================
+ Hits         3256     3273   +17     
- Misses        230      231    +1

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Thuener · 2024-09-05T08:12:21Z

Thanks!

I will try this one.

One detail, as the bounds don't change in the middle of the backwards, it is better to change the arm just one time, no? That should be better for performance and if you change just for some nodes, it would break the cuts.

function prepare_backward_pass(
    node::Node,
    handler::BanditDuality,
    options::Options,
)
    if length(options.log) > handler.logs_seen
        _update_rewards(handler, options.log)
        _choose_best_arm(handler)
        handler.logs_seen = length(options.log)
    end
    return prepare_backward_pass(node, handler.arms[handler.last_arm_index], options)
end

odow · 2024-09-05T08:50:02Z

Good spotting. You might understand this better than I do. It's a while since I thought about it!

odow · 2024-09-06T01:01:30Z

Merging because I think this is an improvement.

My longer term project is to set up some sort of benchmarking so we could actually test if this was a good idea.

odow added 3 commits September 5, 2024 10:53

Change BanditDuality to use a prior and softmax to randomize arms

99586ce

Fix formatting

fb1b7b6

Update

af15836

Update

a80f914

Update duality_handlers.jl

9b9817e

Update duality_handlers.jl

3f871cd

odow merged commit 05748b9 into master Sep 6, 2024
8 checks passed

odow deleted the od/bandit branch September 6, 2024 01:01

odow mentioned this pull request Sep 6, 2024

Getting out of local optimal SDDiP #777

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Change BanditDuality to use a prior and softmax to randomize arms #779

Change BanditDuality to use a prior and softmax to randomize arms #779

odow commented Sep 4, 2024

codecov bot commented Sep 5, 2024 •

edited

Loading

Thuener commented Sep 5, 2024 •

edited

Loading

odow commented Sep 5, 2024

odow commented Sep 6, 2024

Change BanditDuality to use a prior and softmax to randomize arms #779

Change BanditDuality to use a prior and softmax to randomize arms #779

Conversation

odow commented Sep 4, 2024

codecov bot commented Sep 5, 2024 • edited Loading

Codecov Report

Thuener commented Sep 5, 2024 • edited Loading

odow commented Sep 5, 2024

odow commented Sep 6, 2024

codecov bot commented Sep 5, 2024 •

edited

Loading

Thuener commented Sep 5, 2024 •

edited

Loading