File size: 2,690 Bytes
f032908
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
---
library_name: stable-baselines3
tags:
- FruitBox
- reinforcement-learning
- ppo
- game-ai
- puzzle-solving
model-index:
- name: AlphaApple
  results:
  - task:
      type: reinforcement-learning
      name: Reinforcement Learning
    dataset:
      name: FruitBox Game
      type: fruitbox
    metrics:
    - type: mean_reward
      value: 77.0
      name: Mean Episode Score
    - type: improvement_vs_random
      value: 7.1%
      name: Improvement vs Random
    - type: improvement_vs_greedy  
      value: 5.0%
      name: Improvement vs Greedy
---

# AlphaApple: FruitBox Game AI Agent

## Model Description

이 λͺ¨λΈμ€ ν•œκ΅­μ˜ μ‚¬κ³Όκ²Œμž„(FruitBox) 퍼즐을 ν•΄κ²°ν•˜λŠ” AI μ—μ΄μ „νŠΈμž…λ‹ˆλ‹€. 
10Γ—17 κ²©μžμ—μ„œ 합이 10인 μ§μ‚¬κ°ν˜•μ„ μ°Ύμ•„ μ œκ±°ν•˜λŠ” κ²Œμž„μ„ PPO(Proximal Policy Optimization) μ•Œκ³ λ¦¬μ¦˜μœΌλ‘œ ν•™μŠ΅ν–ˆμŠ΅λ‹ˆλ‹€.

## Game Rules

- 10Γ—17 격자, 각 셀은 1-9 숫자
- μ§μ‚¬κ°ν˜• μ˜μ—­μ„ μ„ νƒν•΄μ„œ 숫자 합이 μ •ν™•νžˆ 10이면 ν•΄λ‹Ή μ˜μ—­ 제거
- 제거된 μ…€ 개수만큼 점수 νšλ“
- 더 이상 μ œκ±°ν•  수 μžˆλŠ” μ˜μ—­μ΄ μ—†μœΌλ©΄ κ²Œμž„ μ’…λ£Œ

## Performance

| Agent   | Average Score | Improvement |
|---------|--------------|-------------|
| Random  | 71.9         | -           |
| Greedy  | 73.3         | +1.9%       |
| **PPO** | **77.0**     | **+7.1%**   |

## Usage

### Python (PyTorch)

```python
from stable_baselines3 import PPO
from stable_baselines3.common.vec_env import DummyVecEnv

# Load model
model = PPO.load("pytorch_model.zip")

# Use for inference
obs = env.reset()
action, _ = model.predict(obs)
```

### Web/JavaScript (ONNX)

```javascript
import { InferenceSession } from 'onnxruntime-web';

// Load ONNX model
const session = await InferenceSession.create('./fruitbox_ppo.onnx');

// Predict action
const { action_logits } = await session.run({
    board_input: new ort.Tensor('float32', board_data, [1, 17, 10, 1])
});
const action = action_logits.data.indexOf(Math.max(...action_logits.data));
```

## Files

- `pytorch_model.zip`: Original SB3 PPO model 
- `fruitbox_ppo.onnx`: ONNX version for web deployment (2.95MB)
- `model_info.json`: Model metadata and performance metrics

## Training Details

- Algorithm: PPO with action masking
- Network: Custom CNN (SmallGridCNN)
- Training steps: 1,000,000
- Environment: Custom Gymnasium environment
- Action space: 8,415 possible rectangles (masked)

## Repository

Source code: https://github.com/your-username/alphaapple

## Citation

```bibtex
@misc{alphaapple2024,
  title={AlphaApple: AI Agent for FruitBox Puzzle Game},
  author={Your Name},
  year={2024},
  howpublished={\url{https://huggingface.co/AlphaApple}}
}
```