File size: 7,059 Bytes
a0229a8 e41d7de d58baf8 e41d7de d58baf8 e41d7de e6bcd78 b4ca741 e6bcd78 daf5d71 e6bcd78 d4a0976 85a253c 4da2759 e6bcd78 4da2759 e6bcd78 daf5d71 4da2759 b4ca741 86d77a3 85a253c daf5d71 85a253c 4da2759 85a253c 4da2759 b4ca741 4da2759 daf5d71 4da2759 daf5d71 4da2759 daf5d71 4da2759 daf5d71 4da2759 daf5d71 4da2759 b4ca741 daf5d71 e6bcd78 daf5d71 4da2759 daf5d71 4da2759 daf5d71 b4ca741 4da2759 daf5d71 4da2759 daf5d71 4da2759 daf5d71 4da2759 daf5d71 b4ca741 4da2759 daf5d71 4da2759 daf5d71 b4ca741 daf5d71 4da2759 daf5d71 4da2759 daf5d71 4da2759 daf5d71 4da2759 daf5d71 4da2759 daf5d71 4da2759 daf5d71 4da2759 daf5d71 b4ca741 daf5d71 4da2759 b4ca741 4da2759 daf5d71 4da2759 daf5d71 4da2759 daf5d71 a9aa5e7 daf5d71 b4ca741 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 |
---
language: en
tags:
- evolutionary-strategy
- cma-es
- gymnasium
- cartpole
- optimization
library_name: custom
datasets:
- gymnasium/CartPole-v1
metrics:
- mean_episode_length
model-index:
- name: CartPole-CMA-ES
results:
- task:
type: optimization
name: CartPole-v1
dataset:
name: gymnasium/CartPole-v1
type: gymnasium
metrics:
- type: mean_episode_length
value: 500
name: Mean Episode Length
license: mit
pipeline_tag: reinforcement-learning
---
# CartPole-v1 CMA-ES Solution
This model provides a solution to the CartPole-v1 environment using CMA-ES (Covariance Matrix Adaptation Evolution Strategy),
achieving perfect performance with a simple linear policy. The implementation demonstrates how evolutionary strategies can
effectively solve classic control problems with minimal architecture complexity.
### Video Preview
<video controls width="480">
<source src="https://huggingface.co/bniladridas/cartpole-cmaes/resolve/main/preview.mp4" type="video/mp4">
Your browser does not support the video tag.
</video>
### Training Convergence

*Figure: Training convergence showing the mean fitness (episode length) across generations. The model achieves optimal performance (500 steps) within 3 generations.*
## Model Details
### Model Description
This is a linear policy model for the CartPole-v1 environment that:
- Uses a simple weight matrix to map 4D state inputs to 2D action outputs
- Achieves optimal performance (500/500 steps) consistently
- Was optimized using CMA-ES, requiring only 3 generations for convergence
- Demonstrates sample-efficient learning for the CartPole balancing task
```python
def get_action(self, observation):
observation = np.array(observation, dtype=np.float32)
action_scores = np.dot(observation, self.weights)
action_scores += np.random.randn(*action_scores.shape) * 1e-5
return int(np.argmax(action_scores))
```
- **Developed by:** Niladri Das
- **Model type:** Linear Policy
- **Language:** Python
- **License:** MIT
- **Finetuned from model:** No (trained from scratch)
### Model Sources
- **Repository:** https://github.com/bniladridas/cmaes-rl
- **Hugging Face:** https://huggingface.co/bniladridas/cartpole-cmaes
- **Website:** https://bniladridas.github.io/cmaes-rl/
## Uses
### Direct Use
The model is designed for:
1. Solving the CartPole-v1 environment from Gymnasium
2. Demonstrating CMA-ES optimization for RL tasks
3. Serving as a baseline for comparison with other algorithms
4. Educational purposes in evolutionary strategies
### Out-of-Scope Use
The model should not be used for:
1. Complex control tasks beyond CartPole
2. Real-world robotics applications
3. Tasks requiring non-linear policies
4. Environments with partial observability
## Bias, Risks, and Limitations
### Technical Limitations
- Limited to CartPole-v1 environment
- Requires full state observation
- Linear policy architecture
- No transfer learning capability
- Environment-specific solution
### Performance Limitations
- May not handle significant environment variations
- No adaptation to changing dynamics
- Limited by linear policy capacity
- Requires precise state information
### Recommendations
Users should:
1. Only use for CartPole-v1 environment
2. Ensure full state observability
3. Understand the limitations of linear policies
4. Consider more complex architectures for other tasks
5. Validate performance in their specific setup
## How to Get Started with the Model
### Method 1: Using the CMAESAgent Class
```python
from model import CMAESAgent
# Load the model
agent = CMAESAgent.from_pretrained("bniladridas/cartpole-cmaes")
# Evaluate
mean_reward, std_reward = agent.evaluate(num_episodes=5)
print(f"Mean reward: {mean_reward:.2f} ± {std_reward:.2f}")
```
### Method 2: Manual Implementation
```python
import numpy as np
from gymnasium import make
# Load model weights
weights = np.load('model_weights.npy') # 4x2 matrix
# Create environment
env = make('CartPole-v1')
# Run inference
def get_action(observation):
logits = observation @ weights
return int(np.argmax(logits))
observation, _ = env.reset()
while True:
action = get_action(observation)
observation, reward, done, truncated, info = env.step(action)
if done or truncated:
break
```
## Training Details
### Training Data
- **Environment:** Gymnasium CartPole-v1
- **State Space:** 4D continuous (cart position, velocity, pole angle, angular velocity)
- **Action Space:** 2D discrete (left, right)
- **Reward:** +1 for each step, max 500 steps
- **Episode Termination:** Pole angle > 15°, cart position > 2.4, or 500 steps reached
- **Training Approach:** Direct environment interaction (no pre-collected dataset)
### Training Procedure
#### Training Hyperparameters
- **Algorithm:** CMA-ES
- **Population size:** 16
- **Number of generations:** 100 (early convergence by generation 3)
- **Initial step size:** 0.5
- **Parameters:** 8 (4x2 weight matrix)
- **Training regime:** Single precision (fp32)
#### Hardware Requirements
- **CPU:** Single core sufficient
- **Memory:** <100MB RAM
- **GPU:** Not required
- **Training time:** ~5 minutes on standard CPU
### Evaluation
#### Testing Data & Metrics
- **Environment:** Same as training (CartPole-v1)
- **Episodes:** 100 test episodes
- **Metrics:** Episode length, success rate
#### Results
- **Average Episode Length:** 500.0 ±0.0
- **Success Rate:** 100%
- **Convergence:** Achieved in 3 generations
- **Final Population Mean:** 500.00
- **Best Performance:** 500/500 consistently
## Implementation Details
The implementation employs a straightforward linear policy:
```python
class CMAESAgent:
def __init__(self, env_name):
self.env = gym.make(env_name)
self.observation_space = self.env.observation_space.shape[0] # 4 for CartPole
self.action_space = self.env.action_space.n # 2 for CartPole
self.num_params = self.observation_space * self.action_space # 8 total parameters
self.weights = None
def get_action(self, observation):
observation = np.array(observation, dtype=np.float32)
action_scores = np.dot(observation, self.weights)
action_scores += np.random.randn(*action_scores.shape) * 1e-5 # Small noise for stability
return int(np.argmax(action_scores))
```
The model's simplicity demonstrates that CartPole's optimal control policy is approximately linear in the state variables.
## Environmental Impact
- **Training time:** ~5 minutes
- **Hardware:** Standard CPU
- **Energy consumption:** Negligible (<0.001 kWh)
- **CO2 emissions:** Minimal (<0.001 kg)
## Citation
**BibTeX:**
```bibtex
@misc{das2024cartpole,
author = {Niladri Das},
title = {CartPole-v1 CMA-ES Solution},
year = {2025},
publisher = {Hugging Face},
journal = {Hugging Face Model Hub},
howpublished = {https://huggingface.co/bniladridas/cartpole-cmaes},
url = {https://github.com/bniladridas/cmaes-rl}
} |