Axiom Futures AI Safety Course Week 4 notes July 10, 2024 Collecting human feedback, fitting a reward model, and optimizing the policy with RL.