π Firefighter GridWorld Leaderboard
A reinforcement learning benchmark in a 4x4 grid world where the agent must:
- Pick up a water bucket π§
- Extinguish a fire π₯
- Reach the goal π
The environment features deterministic and stochastic versions with discrete actions, rewards, penalties, and sprite-based rendering.
π Leaderboard (300 Episodes)
Rank | Model | Mean Reward | Std Dev | Success Rate | Notes |
---|---|---|---|---|---|
π₯ 1 | MCTS | 27.4 | 3.64 | 1.00 | 50 simulations, random rollout |
2 | PPO | 4.0 | 5.83 | ~0.40 | Trained with Stable-Baselines3 |
3 | DQN | β30.0 | 23.9 | β | Failed task consistently |
π§ͺ Evaluation Protocol
Each agent is evaluated over 300 episodes
Maximum steps per episode: 60
Environment starts with the robot in the top-left
Rewards:
- +10: extinguish fire and reach goal
- β1: step penalty
- β5: invalid actions or skipping steps
π Setup
pip install -r requirements.txt
π Evaluate Your Agent
- Clone the repo:
git clone https://huggingface.co/spaces/YOUR_USERNAME/firefighter-gridworld-leaderboard
cd firefighter-gridworld-leaderboard
- Run evaluation:
python evaluation/evaluate_custom_agent.py --path ./my_agent.zip --algo PPO
- Submit your
eval_results.json
via Pull Request.
π§ Environment API
Custom environment follows Gymnasium standards:
import gymnasium as gym
from env.firefighter_env import FireFighterEnv
env = FireFighterEnv()
obs, info = env.reset()
for _ in range(60):
action = env.action_space.sample()
obs, reward, terminated, truncated, info = env.step(action)
if terminated or truncated:
break
π₯ Submissions
Include in your Pull Request:
eval_results.json
- Description of your model and training setup
- GIF of successful episode (optional)
π¦ Files
env/
β environment codeagents/
β training scripts (PPO, DQN, MCTS)evaluation/
β evaluation and renderingmodels/
β saved agentsassets/
β sprites and animation
π License
MIT License. Contributions welcome!