kbsooo commited on
Commit
f032908
Β·
verified Β·
1 Parent(s): 482f6f2

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +110 -3
README.md CHANGED
@@ -1,3 +1,110 @@
1
- ---
2
- license: mit
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ library_name: stable-baselines3
3
+ tags:
4
+ - FruitBox
5
+ - reinforcement-learning
6
+ - ppo
7
+ - game-ai
8
+ - puzzle-solving
9
+ model-index:
10
+ - name: AlphaApple
11
+ results:
12
+ - task:
13
+ type: reinforcement-learning
14
+ name: Reinforcement Learning
15
+ dataset:
16
+ name: FruitBox Game
17
+ type: fruitbox
18
+ metrics:
19
+ - type: mean_reward
20
+ value: 77.0
21
+ name: Mean Episode Score
22
+ - type: improvement_vs_random
23
+ value: 7.1%
24
+ name: Improvement vs Random
25
+ - type: improvement_vs_greedy
26
+ value: 5.0%
27
+ name: Improvement vs Greedy
28
+ ---
29
+
30
+ # AlphaApple: FruitBox Game AI Agent
31
+
32
+ ## Model Description
33
+
34
+ 이 λͺ¨λΈμ€ ν•œκ΅­μ˜ μ‚¬κ³Όκ²Œμž„(FruitBox) 퍼즐을 ν•΄κ²°ν•˜λŠ” AI μ—μ΄μ „νŠΈμž…λ‹ˆλ‹€.
35
+ 10Γ—17 κ²©μžμ—μ„œ 합이 10인 μ§μ‚¬κ°ν˜•μ„ μ°Ύμ•„ μ œκ±°ν•˜λŠ” κ²Œμž„μ„ PPO(Proximal Policy Optimization) μ•Œκ³ λ¦¬μ¦˜μœΌλ‘œ ν•™μŠ΅ν–ˆμŠ΅λ‹ˆλ‹€.
36
+
37
+ ## Game Rules
38
+
39
+ - 10Γ—17 격자, 각 셀은 1-9 숫자
40
+ - μ§μ‚¬κ°ν˜• μ˜μ—­μ„ μ„ νƒν•΄μ„œ 숫자 합이 μ •ν™•νžˆ 10이면 ν•΄λ‹Ή μ˜μ—­ 제거
41
+ - 제거된 μ…€ 개수만큼 점수 νšλ“
42
+ - 더 이상 μ œκ±°ν•  수 μžˆλŠ” μ˜μ—­μ΄ μ—†μœΌλ©΄ κ²Œμž„ μ’…λ£Œ
43
+
44
+ ## Performance
45
+
46
+ | Agent | Average Score | Improvement |
47
+ |---------|--------------|-------------|
48
+ | Random | 71.9 | - |
49
+ | Greedy | 73.3 | +1.9% |
50
+ | **PPO** | **77.0** | **+7.1%** |
51
+
52
+ ## Usage
53
+
54
+ ### Python (PyTorch)
55
+
56
+ ```python
57
+ from stable_baselines3 import PPO
58
+ from stable_baselines3.common.vec_env import DummyVecEnv
59
+
60
+ # Load model
61
+ model = PPO.load("pytorch_model.zip")
62
+
63
+ # Use for inference
64
+ obs = env.reset()
65
+ action, _ = model.predict(obs)
66
+ ```
67
+
68
+ ### Web/JavaScript (ONNX)
69
+
70
+ ```javascript
71
+ import { InferenceSession } from 'onnxruntime-web';
72
+
73
+ // Load ONNX model
74
+ const session = await InferenceSession.create('./fruitbox_ppo.onnx');
75
+
76
+ // Predict action
77
+ const { action_logits } = await session.run({
78
+ board_input: new ort.Tensor('float32', board_data, [1, 17, 10, 1])
79
+ });
80
+ const action = action_logits.data.indexOf(Math.max(...action_logits.data));
81
+ ```
82
+
83
+ ## Files
84
+
85
+ - `pytorch_model.zip`: Original SB3 PPO model
86
+ - `fruitbox_ppo.onnx`: ONNX version for web deployment (2.95MB)
87
+ - `model_info.json`: Model metadata and performance metrics
88
+
89
+ ## Training Details
90
+
91
+ - Algorithm: PPO with action masking
92
+ - Network: Custom CNN (SmallGridCNN)
93
+ - Training steps: 1,000,000
94
+ - Environment: Custom Gymnasium environment
95
+ - Action space: 8,415 possible rectangles (masked)
96
+
97
+ ## Repository
98
+
99
+ Source code: https://github.com/your-username/alphaapple
100
+
101
+ ## Citation
102
+
103
+ ```bibtex
104
+ @misc{alphaapple2024,
105
+ title={AlphaApple: AI Agent for FruitBox Puzzle Game},
106
+ author={Your Name},
107
+ year={2024},
108
+ howpublished={\url{https://huggingface.co/AlphaApple}}
109
+ }
110
+ ```