DeepRacer: Proximal Policy Optimization and Cross-task Adaptation

Implementation and training of Proximal Policy Optimization (PPO) for AWS DeepRacer autonomous racing: 5-stage curriculum + warm-start transfer to obstacle-avoidance and head-to-bot tasks.