AlphaApple / README.md
kbsooo's picture
Upload README.md with huggingface_hub
f032908 verified
---
library_name: stable-baselines3
tags:
- FruitBox
- reinforcement-learning
- ppo
- game-ai
- puzzle-solving
model-index:
- name: AlphaApple
results:
- task:
type: reinforcement-learning
name: Reinforcement Learning
dataset:
name: FruitBox Game
type: fruitbox
metrics:
- type: mean_reward
value: 77.0
name: Mean Episode Score
- type: improvement_vs_random
value: 7.1%
name: Improvement vs Random
- type: improvement_vs_greedy
value: 5.0%
name: Improvement vs Greedy
---
# AlphaApple: FruitBox Game AI Agent
## Model Description
이 λͺ¨λΈμ€ ν•œκ΅­μ˜ μ‚¬κ³Όκ²Œμž„(FruitBox) 퍼즐을 ν•΄κ²°ν•˜λŠ” AI μ—μ΄μ „νŠΈμž…λ‹ˆλ‹€.
10Γ—17 κ²©μžμ—μ„œ 합이 10인 μ§μ‚¬κ°ν˜•μ„ μ°Ύμ•„ μ œκ±°ν•˜λŠ” κ²Œμž„μ„ PPO(Proximal Policy Optimization) μ•Œκ³ λ¦¬μ¦˜μœΌλ‘œ ν•™μŠ΅ν–ˆμŠ΅λ‹ˆλ‹€.
## Game Rules
- 10Γ—17 격자, 각 셀은 1-9 숫자
- μ§μ‚¬κ°ν˜• μ˜μ—­μ„ μ„ νƒν•΄μ„œ 숫자 합이 μ •ν™•νžˆ 10이면 ν•΄λ‹Ή μ˜μ—­ 제거
- 제거된 μ…€ 개수만큼 점수 νšλ“
- 더 이상 μ œκ±°ν•  수 μžˆλŠ” μ˜μ—­μ΄ μ—†μœΌλ©΄ κ²Œμž„ μ’…λ£Œ
## Performance
| Agent | Average Score | Improvement |
|---------|--------------|-------------|
| Random | 71.9 | - |
| Greedy | 73.3 | +1.9% |
| **PPO** | **77.0** | **+7.1%** |
## Usage
### Python (PyTorch)
```python
from stable_baselines3 import PPO
from stable_baselines3.common.vec_env import DummyVecEnv
# Load model
model = PPO.load("pytorch_model.zip")
# Use for inference
obs = env.reset()
action, _ = model.predict(obs)
```
### Web/JavaScript (ONNX)
```javascript
import { InferenceSession } from 'onnxruntime-web';
// Load ONNX model
const session = await InferenceSession.create('./fruitbox_ppo.onnx');
// Predict action
const { action_logits } = await session.run({
board_input: new ort.Tensor('float32', board_data, [1, 17, 10, 1])
});
const action = action_logits.data.indexOf(Math.max(...action_logits.data));
```
## Files
- `pytorch_model.zip`: Original SB3 PPO model
- `fruitbox_ppo.onnx`: ONNX version for web deployment (2.95MB)
- `model_info.json`: Model metadata and performance metrics
## Training Details
- Algorithm: PPO with action masking
- Network: Custom CNN (SmallGridCNN)
- Training steps: 1,000,000
- Environment: Custom Gymnasium environment
- Action space: 8,415 possible rectangles (masked)
## Repository
Source code: https://github.com/your-username/alphaapple
## Citation
```bibtex
@misc{alphaapple2024,
title={AlphaApple: AI Agent for FruitBox Puzzle Game},
author={Your Name},
year={2024},
howpublished={\url{https://huggingface.co/AlphaApple}}
}
```