Mission Control
Adversarial Intelligence & Collaborative AI.
Project Synopsis
Multi-Agent Ops explores the complexity of cooperation and competition. We are training a system where two teams (Blue vs Red) compete in a high-speed physics environment (Soccer/Tag).
The focus is on Adversarial Self-Play. The AI doesn't learn from a static opponent; it learns by playing against past versions of itself, creating an "Arms Race" of strategy.
Why This Matters
Single-agent RL finds optimal paths. Multi-agent RL finds optimal strategies (Offense vs Defense). This demonstrates advanced knowledge of Game Theory and MA-PPO algorithms.
Tech Stack
Project Checkpoints
- Phase 1: The Competitive Arena (Symmetric Map)
- Phase 2: Collaborative Reward Engineering (Teamwork)
- Phase 3: Adversarial Self-Play (Arms Race)
- Phase 4: The Final Tournament (Showcase)
Field Notes & Learnings
Key engineering concepts for Multi-Agent Systems.
1. Adversarial Self-Play
Concept: If an agent trains against a dumb opponent, it learns dumb strategies. If it trains against a pro, it learns nothing (gets crushed).
Solution: Self-Play with History. The agent plays against a copy of itself from 10,000 steps ago. This ensures the opponent is always "slightly worse but competent," facilitating steady growth.
2. Reward Balance
Concept: If "Goal Scored" = +1.0 for everyone, agents might get lazy. If "Touch Ball" = +0.1, they might hog the ball.
Solution: Mix Intrinsic (Individual) and Extrinsic (Group) rewards. Start with high individual rewards to teach mechanics, then shift to group rewards to teach strategy.
3. Elo Rating System
Concept: How do we know if Version 50 is better than Version 10?
Solution: Implement an Elo Tracker. Treat every training epoch as a "match." If V50 beats V40, its rating goes up. We graph this to prove the AI is actually getting smarter.
4. Ghosting
Preventing "Policy Oscillations" (Rock-Paper-Scissors loops):
- The Problem: Agent learns "Rock" beats "Scissors". Then learns "Paper" beats "Rock". Then forgets "Rock" and loses to "Scissors".
- The Fix: Randomly swap the opponent with a "Ghost" (Saved Policy) from the past to force the agent to remember how to beat ALL previous strategies.
Implementation
Step-by-step Execution Plan.
Phase 1: The Arena (Week 1)
- Design: Build symmetric 2v2 soccer pitch.
- Logic: Assign Team IDs (Blue=0, Red=1).
- Scoring: Global triggers for Win/Loss state.
Phase 2: Cooperation (Week 2)
- Rewards: Balance "Pass Ball" vs "Win Game".
- Curriculum: Start with empty goal -> Add goalie later.
- Obs: Feed teammate positions into Observation Vector.
Phase 3: Self-Play (Week 3)
- Config: Enable `self_play` in ML-Agents YAML.
- Elo: Track win-rate against past versions.
- Ghosts: Save snapshots of best models for regression testing.
Phase 4: Tournament (Week 4)
- Visuals: Distinct skins for Blue/Red teams.
- DevLog: Compilation of "Big Brain Plays".
Dev Logs
Engineering notes & daily updates.
Entry 000 Planning
Date: Feb 3, 2026
Project 09 queued for October. Focusing on Adversarial RL and Multi-Agent Cooperation.