Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change BanditDuality to use a prior and softmax to randomize arms #779

Merged
merged 6 commits into from
Sep 6, 2024

Conversation

odow
Copy link
Owner

@odow odow commented Sep 4, 2024

x-ref #777

@Thuener do you want to try this branch? (I haven't actually run it locally. Just edited without testing.)

Hopefully this PR:

  • Initializes with a default prior so that we do't have the 10 / constant bound issue
  • Randomizes the arm selection according to softmax, so to arms with similar scores will be randomly chosen, decaying quite quickly if the scores are differennt
  • Adds a max(, 0.1) to the time denominator to hedge against iterations that take a very short amount of time and increase the reward by too much.

Copy link

codecov bot commented Sep 5, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 93.40%. Comparing base (d2495d6) to head (3f871cd).
Report is 1 commits behind head on master.

Additional details and impacted files
@@           Coverage Diff           @@
##           master     #779   +/-   ##
=======================================
  Coverage   93.40%   93.40%           
=======================================
  Files          27       27           
  Lines        3486     3504   +18     
=======================================
+ Hits         3256     3273   +17     
- Misses        230      231    +1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@Thuener
Copy link
Collaborator

Thuener commented Sep 5, 2024

Thanks!

I will try this one.

One detail, as the bounds don't change in the middle of the backwards, it is better to change the arm just one time, no? That should be better for performance and if you change just for some nodes, it would break the cuts.

function prepare_backward_pass(
    node::Node,
    handler::BanditDuality,
    options::Options,
)
    if length(options.log) > handler.logs_seen
        _update_rewards(handler, options.log)
        _choose_best_arm(handler)
        handler.logs_seen = length(options.log)
    end
    return prepare_backward_pass(node, handler.arms[handler.last_arm_index], options)
end

@odow
Copy link
Owner Author

odow commented Sep 5, 2024

Good spotting. You might understand this better than I do. It's a while since I thought about it!

@odow
Copy link
Owner Author

odow commented Sep 6, 2024

Merging because I think this is an improvement.

My longer term project is to set up some sort of benchmarking so we could actually test if this was a good idea.

@odow odow merged commit 05748b9 into master Sep 6, 2024
8 checks passed
@odow odow deleted the od/bandit branch September 6, 2024 01:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants