Independently completed all assignments for McGill's reinforcement learning course taught by Prof. Doina Precup.
Experimented with the k-armed bandit problem: implemented the epsilon-greedy, UCB, and Thompson sampling algorithms and tested them on different hyperparameter values; computed and plotted regret
Implemented and compared the performance of SARSA and expected SARSA on the Frozen Lake domain from OpenAI Gym; Implemented and compared the performance of Q-learning and actor-critic with linear function approximation on the cart-pole problem.
Experimented with offline RL: ran the Q-learning agent from assignment 2 (the expert) and a random agent on the cart-pole problem and gathered 500 behavioral episodes for each; trained an imitation learning agent and a fitted Q-learning agent on each dataset and compared the results.