Mission Control

Autonomous Racing with Proximal Policy Optimization (PPO).

Project Synopsis

Deep-RL Racer is a high-performance research project leveraging the Unity ML-Agents Toolkit. The goal is to train a neural network to navigate a complex track without human input.

By using PyTorch as the training backend and Unity as the simulation environment, we will implement massively parallel training to teach an agent optimal racing lines through trial and error (Reinforcement Learning).

Why This Matters

This bridges the gap between Game Dev and AI Research. Mastering Reward Engineering and Hyperparameter Tuning demonstrates the ability to solve complex, non-linear control problems—a key skill for autonomous systems.

Tech Stack

Unity ML-Agents PyTorch C# (Rewards) NVIDIA CUDA

Project Checkpoints

Phase 1: The Training Ground (Environment)
Phase 2: Brain Configuration (Hyperparameters)
Phase 3: Massively Parallel Training (Optimization)
Phase 4: Final Race & Portfolio (Inference)

Field Notes & Learnings

Key engineering concepts for Reinforcement Learning.

1. Reward Shaping

Concept: The agent has no concept of "winning." It only seeks to maximize numbers.

Solution: We must design a Dense Reward Function.
• +0.1 for moving forward (Speed).
• +1.0 for hitting checkpoints (Progress).
• -1.0 for hitting walls (Penalty).
Too sparse, and it learns nothing. Too complex, and it finds exploits (looping in circles).

2. Parallel Environments

Concept: Training a single car takes days. RL is data-hungry.

Solution: Duplicate the track 20 times in the scene. Unity ML-Agents collects experience from all 20 instances simultaneously, effectively speeding up training time by 20x (Wall-Clock Time).

3. Hyperparameters (YAML)

Concept: The "Brain" needs tuning. How fast does it learn? How much does it explore?

Solution: Configure the `.yaml` file.
• Learning Rate: Step size for gradient descent.
• Beta (Entropy): Encourages exploration (trying new moves) vs exploitation.

4. Training vs. Inference

The lifecycle of an AI Model:

Training: Python communicates with Unity. Heavy GPU usage. Calculating gradients.
Inference: The resulting .onnx model is embedded in Unity. It runs efficiently on CPU/GPU at runtime without Python.

Implementation

Step-by-step Execution Plan.

Phase 1: The Training Ground (Week 1)

Track Design: Build a modular loop with high walls.
Sensors: Attach `RayPerceptionSensor3D` to the agent.
Logic: Write `Agent.OnEpisodeBegin()` and `AddReward()` logic.

Phase 2: Brain Configuration (Week 2)

YAML: Create `racer_config.yaml` with PPO settings.
Academy: Add `DecisionRequester` (Decision Period: 5).
Debug: Verify Observation Space matches Sensor output.

Phase 3: Parallel Training (Week 3)

Scaling: Duplicate track prefabs (15-20 instances).
TensorBoard: Monitor `Cumulative Reward` graph.
Export: Generate `.onnx` brain file after convergence.

Phase 4: Final Race (Week 4)

Inference: Test `.onnx` model on a *new* track layout.
DevLog: Create a "Zero to Hero" learning timelapse.

Dev Logs

Engineering notes & daily updates.

Entry 000 Planning

Date: Feb 3, 2026

Project 05 queued for June. Focusing on Unity ML-Agents and PPO for autonomous racing.