Mission Control

Adversarial Intelligence & Collaborative AI.

Project Synopsis

Multi-Agent Ops explores the complexity of cooperation and competition. We are training a system where two teams (Blue vs Red) compete in a high-speed physics environment (Soccer/Tag).

The focus is on Adversarial Self-Play. The AI doesn't learn from a static opponent; it learns by playing against past versions of itself, creating an "Arms Race" of strategy.

Why This Matters

Single-agent RL finds optimal paths. Multi-agent RL finds optimal strategies (Offense vs Defense). This demonstrates advanced knowledge of Game Theory and MA-PPO algorithms.

Tech Stack

Unity ML-Agents MA-PPO / SAC Elo Rating System Blender

Project Checkpoints

Phase 1: The Competitive Arena (Symmetric Map)
Phase 2: Collaborative Reward Engineering (Teamwork)
Phase 3: Adversarial Self-Play (Arms Race)
Phase 4: The Final Tournament (Showcase)

Field Notes & Learnings

Key engineering concepts for Multi-Agent Systems.

1. Adversarial Self-Play

Concept: If an agent trains against a dumb opponent, it learns dumb strategies. If it trains against a pro, it learns nothing (gets crushed).

Solution: Self-Play with History. The agent plays against a copy of itself from 10,000 steps ago. This ensures the opponent is always "slightly worse but competent," facilitating steady growth.

2. Reward Balance

Concept: If "Goal Scored" = +1.0 for everyone, agents might get lazy. If "Touch Ball" = +0.1, they might hog the ball.

Solution: Mix Intrinsic (Individual) and Extrinsic (Group) rewards. Start with high individual rewards to teach mechanics, then shift to group rewards to teach strategy.

3. Elo Rating System

Concept: How do we know if Version 50 is better than Version 10?

Solution: Implement an Elo Tracker. Treat every training epoch as a "match." If V50 beats V40, its rating goes up. We graph this to prove the AI is actually getting smarter.

4. Ghosting

Preventing "Policy Oscillations" (Rock-Paper-Scissors loops):

The Problem: Agent learns "Rock" beats "Scissors". Then learns "Paper" beats "Rock". Then forgets "Rock" and loses to "Scissors".
The Fix: Randomly swap the opponent with a "Ghost" (Saved Policy) from the past to force the agent to remember how to beat ALL previous strategies.

Implementation

Step-by-step Execution Plan.

Phase 1: The Arena (Week 1)

Design: Build symmetric 2v2 soccer pitch.
Logic: Assign Team IDs (Blue=0, Red=1).
Scoring: Global triggers for Win/Loss state.

Phase 2: Cooperation (Week 2)

Rewards: Balance "Pass Ball" vs "Win Game".
Curriculum: Start with empty goal -> Add goalie later.
Obs: Feed teammate positions into Observation Vector.

Phase 3: Self-Play (Week 3)

Config: Enable `self_play` in ML-Agents YAML.
Elo: Track win-rate against past versions.
Ghosts: Save snapshots of best models for regression testing.

Phase 4: Tournament (Week 4)

Visuals: Distinct skins for Blue/Red teams.
DevLog: Compilation of "Big Brain Plays".

Dev Logs

Engineering notes & daily updates.

Entry 000 Planning

Date: Feb 3, 2026

Project 09 queued for October. Focusing on Adversarial RL and Multi-Agent Cooperation.