File size: 2,690 Bytes
f032908 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 |
---
library_name: stable-baselines3
tags:
- FruitBox
- reinforcement-learning
- ppo
- game-ai
- puzzle-solving
model-index:
- name: AlphaApple
results:
- task:
type: reinforcement-learning
name: Reinforcement Learning
dataset:
name: FruitBox Game
type: fruitbox
metrics:
- type: mean_reward
value: 77.0
name: Mean Episode Score
- type: improvement_vs_random
value: 7.1%
name: Improvement vs Random
- type: improvement_vs_greedy
value: 5.0%
name: Improvement vs Greedy
---
# AlphaApple: FruitBox Game AI Agent
## Model Description
μ΄ λͺ¨λΈμ νκ΅μ μ¬κ³Όκ²μ(FruitBox) νΌμ¦μ ν΄κ²°νλ AI μμ΄μ νΈμ
λλ€.
10Γ17 격μμμ ν©μ΄ 10μΈ μ§μ¬κ°νμ μ°Ύμ μ κ±°νλ κ²μμ PPO(Proximal Policy Optimization) μκ³ λ¦¬μ¦μΌλ‘ νμ΅νμ΅λλ€.
## Game Rules
- 10Γ17 격μ, κ° μ
μ 1-9 μ«μ
- μ§μ¬κ°ν μμμ μ νν΄μ μ«μ ν©μ΄ μ νν 10μ΄λ©΄ ν΄λΉ μμ μ κ±°
- μ κ±°λ μ
κ°μλ§νΌ μ μ νλ
- λ μ΄μ μ κ±°ν μ μλ μμμ΄ μμΌλ©΄ κ²μ μ’
λ£
## Performance
| Agent | Average Score | Improvement |
|---------|--------------|-------------|
| Random | 71.9 | - |
| Greedy | 73.3 | +1.9% |
| **PPO** | **77.0** | **+7.1%** |
## Usage
### Python (PyTorch)
```python
from stable_baselines3 import PPO
from stable_baselines3.common.vec_env import DummyVecEnv
# Load model
model = PPO.load("pytorch_model.zip")
# Use for inference
obs = env.reset()
action, _ = model.predict(obs)
```
### Web/JavaScript (ONNX)
```javascript
import { InferenceSession } from 'onnxruntime-web';
// Load ONNX model
const session = await InferenceSession.create('./fruitbox_ppo.onnx');
// Predict action
const { action_logits } = await session.run({
board_input: new ort.Tensor('float32', board_data, [1, 17, 10, 1])
});
const action = action_logits.data.indexOf(Math.max(...action_logits.data));
```
## Files
- `pytorch_model.zip`: Original SB3 PPO model
- `fruitbox_ppo.onnx`: ONNX version for web deployment (2.95MB)
- `model_info.json`: Model metadata and performance metrics
## Training Details
- Algorithm: PPO with action masking
- Network: Custom CNN (SmallGridCNN)
- Training steps: 1,000,000
- Environment: Custom Gymnasium environment
- Action space: 8,415 possible rectangles (masked)
## Repository
Source code: https://github.com/your-username/alphaapple
## Citation
```bibtex
@misc{alphaapple2024,
title={AlphaApple: AI Agent for FruitBox Puzzle Game},
author={Your Name},
year={2024},
howpublished={\url{https://huggingface.co/AlphaApple}}
}
```
|