Mission Control
Real-time Computer Vision in Unity via OpenCV.
Project Synopsis
Vision-Strike gives the AI actual sight. Instead of using Raycasts (perfect math), we force the agent to interpret a raw pixel feed, simulating real-world robotics constraints.
By streaming a Unity Render Texture to a Python script running OpenCV, we perform object detection (HSV filtering) and return targeting coordinates, closing the loop between Perception and Action.
Why This Matters
Raycasts are "cheating" in the context of robotics. True autonomy requires processing noisy visual data. This project demonstrates mastery over Image Processing Pipelines and Latency Management.
Tech Stack
Project Checkpoints
- Phase 1: The Visual Stream (Byte Transfer)
- Phase 2: Object Detection (OpenCV)
- Phase 3: The Reactive Loop (Control)
- Phase 4: Creative Visuals (HUD & Demo)
Field Notes & Learnings
Key engineering concepts for Computer Vision.
1. The Data Stream
Concept: Sending 60 HD images per second over a socket will crash the network.
Solution: Downsample the Render Texture to 224x224 pixels. This is the standard input size for neural networks (like ResNet) and is sufficient for color tracking while keeping latency low.
2. HSV Color Space
Concept: RGB is bad for detection because "Red" in shadow looks "Brown".
Solution: Convert frames to HSV (Hue, Saturation, Value). Hue separates color from brightness, making the detector robust against lighting changes.
3. PID Controller
Concept: If the camera sees the target to the right, snapping instantly causes jitter.
Solution: Use a Proportional-Integral-Derivative (PID) loop.
• P: Move towards target.
• D: Slow down as you get closer to prevent overshooting.
4. Coordinate Systems
Translating 2D pixels to 3D angles:
- Image Space: (0,0) is top-left in OpenCV, bottom-left in Unity. Flip Y-axis before processing.
- Error Calculation:
ErrorX = (ImageWidth / 2) - TargetX. This value drives the rotation torque.
Implementation
Step-by-step Execution Plan.
Phase 1: The Visual Stream (Week 1)
- Setup: Render Texture (224x224) on 2nd Camera.
- Encoding: Convert Texture to JPG Bytes in C#.
- Stream: Send bytes via Socket to Python.
Phase 2: Object Detection (Week 2)
- Decode: `cv2.imdecode` in Python to get Image Matrix.
- Filter: Apply `cv2.inRange` for Red/Enemy colors.
- Tracking: Find Contours -> Get Bounding Box Center (x,y).
Phase 3: The Reactive Loop (Week 3)
- Mapping: Convert (x,y) to Rotation Error (-1.0 to 1.0).
- Feedback: Send Error data back to Unity via Socket.
- Control: Implement PID script to rotate agent towards target.
Phase 4: Creative Visuals (Week 4)
- HUD: Display the "Computer Vision" view in the corner.
- DevLog: "Human vs Machine" vision comparison video.
Dev Logs
Engineering notes & daily updates.
Entry 000 Planning
Date: Feb 3, 2026
Project 06 queued for July. Bridging Unity and OpenCV for pixel-based AI targeting.