|
--- |
|
library_name: stable-baselines3 |
|
tags: |
|
- FruitBox |
|
- reinforcement-learning |
|
- ppo |
|
- game-ai |
|
- puzzle-solving |
|
model-index: |
|
- name: AlphaApple |
|
results: |
|
- task: |
|
type: reinforcement-learning |
|
name: Reinforcement Learning |
|
dataset: |
|
name: FruitBox Game |
|
type: fruitbox |
|
metrics: |
|
- type: mean_reward |
|
value: 77.0 |
|
name: Mean Episode Score |
|
- type: improvement_vs_random |
|
value: 7.1% |
|
name: Improvement vs Random |
|
- type: improvement_vs_greedy |
|
value: 5.0% |
|
name: Improvement vs Greedy |
|
--- |
|
|
|
# AlphaApple: FruitBox Game AI Agent |
|
|
|
## Model Description |
|
|
|
μ΄ λͺ¨λΈμ νκ΅μ μ¬κ³Όκ²μ(FruitBox) νΌμ¦μ ν΄κ²°νλ AI μμ΄μ νΈμ
λλ€. |
|
10Γ17 격μμμ ν©μ΄ 10μΈ μ§μ¬κ°νμ μ°Ύμ μ κ±°νλ κ²μμ PPO(Proximal Policy Optimization) μκ³ λ¦¬μ¦μΌλ‘ νμ΅νμ΅λλ€. |
|
|
|
## Game Rules |
|
|
|
- 10Γ17 격μ, κ° μ
μ 1-9 μ«μ |
|
- μ§μ¬κ°ν μμμ μ νν΄μ μ«μ ν©μ΄ μ νν 10μ΄λ©΄ ν΄λΉ μμ μ κ±° |
|
- μ κ±°λ μ
κ°μλ§νΌ μ μ νλ |
|
- λ μ΄μ μ κ±°ν μ μλ μμμ΄ μμΌλ©΄ κ²μ μ’
λ£ |
|
|
|
## Performance |
|
|
|
| Agent | Average Score | Improvement | |
|
|---------|--------------|-------------| |
|
| Random | 71.9 | - | |
|
| Greedy | 73.3 | +1.9% | |
|
| **PPO** | **77.0** | **+7.1%** | |
|
|
|
## Usage |
|
|
|
### Python (PyTorch) |
|
|
|
```python |
|
from stable_baselines3 import PPO |
|
from stable_baselines3.common.vec_env import DummyVecEnv |
|
|
|
# Load model |
|
model = PPO.load("pytorch_model.zip") |
|
|
|
# Use for inference |
|
obs = env.reset() |
|
action, _ = model.predict(obs) |
|
``` |
|
|
|
### Web/JavaScript (ONNX) |
|
|
|
```javascript |
|
import { InferenceSession } from 'onnxruntime-web'; |
|
|
|
// Load ONNX model |
|
const session = await InferenceSession.create('./fruitbox_ppo.onnx'); |
|
|
|
// Predict action |
|
const { action_logits } = await session.run({ |
|
board_input: new ort.Tensor('float32', board_data, [1, 17, 10, 1]) |
|
}); |
|
const action = action_logits.data.indexOf(Math.max(...action_logits.data)); |
|
``` |
|
|
|
## Files |
|
|
|
- `pytorch_model.zip`: Original SB3 PPO model |
|
- `fruitbox_ppo.onnx`: ONNX version for web deployment (2.95MB) |
|
- `model_info.json`: Model metadata and performance metrics |
|
|
|
## Training Details |
|
|
|
- Algorithm: PPO with action masking |
|
- Network: Custom CNN (SmallGridCNN) |
|
- Training steps: 1,000,000 |
|
- Environment: Custom Gymnasium environment |
|
- Action space: 8,415 possible rectangles (masked) |
|
|
|
## Repository |
|
|
|
Source code: https://github.com/your-username/alphaapple |
|
|
|
## Citation |
|
|
|
```bibtex |
|
@misc{alphaapple2024, |
|
title={AlphaApple: AI Agent for FruitBox Puzzle Game}, |
|
author={Your Name}, |
|
year={2024}, |
|
howpublished={\url{https://huggingface.co/AlphaApple}} |
|
} |
|
``` |
|
|