AlphaApple: FruitBox Game AI Agent

Model Description

이 λͺ¨λΈμ€ ν•œκ΅­μ˜ μ‚¬κ³Όκ²Œμž„(FruitBox) 퍼즐을 ν•΄κ²°ν•˜λŠ” AI μ—μ΄μ „νŠΈμž…λ‹ˆλ‹€. 10Γ—17 κ²©μžμ—μ„œ 합이 10인 μ§μ‚¬κ°ν˜•μ„ μ°Ύμ•„ μ œκ±°ν•˜λŠ” κ²Œμž„μ„ PPO(Proximal Policy Optimization) μ•Œκ³ λ¦¬μ¦˜μœΌλ‘œ ν•™μŠ΅ν–ˆμŠ΅λ‹ˆλ‹€.

Game Rules

  • 10Γ—17 격자, 각 셀은 1-9 숫자
  • μ§μ‚¬κ°ν˜• μ˜μ—­μ„ μ„ νƒν•΄μ„œ 숫자 합이 μ •ν™•νžˆ 10이면 ν•΄λ‹Ή μ˜μ—­ 제거
  • 제거된 μ…€ 개수만큼 점수 νšλ“
  • 더 이상 μ œκ±°ν•  수 μžˆλŠ” μ˜μ—­μ΄ μ—†μœΌλ©΄ κ²Œμž„ μ’…λ£Œ

Performance

Agent Average Score Improvement
Random 71.9 -
Greedy 73.3 +1.9%
PPO 77.0 +7.1%

Usage

Python (PyTorch)

from stable_baselines3 import PPO
from stable_baselines3.common.vec_env import DummyVecEnv

# Load model
model = PPO.load("pytorch_model.zip")

# Use for inference
obs = env.reset()
action, _ = model.predict(obs)

Web/JavaScript (ONNX)

import { InferenceSession } from 'onnxruntime-web';

// Load ONNX model
const session = await InferenceSession.create('./fruitbox_ppo.onnx');

// Predict action
const { action_logits } = await session.run({
    board_input: new ort.Tensor('float32', board_data, [1, 17, 10, 1])
});
const action = action_logits.data.indexOf(Math.max(...action_logits.data));

Files

  • pytorch_model.zip: Original SB3 PPO model
  • fruitbox_ppo.onnx: ONNX version for web deployment (2.95MB)
  • model_info.json: Model metadata and performance metrics

Training Details

  • Algorithm: PPO with action masking
  • Network: Custom CNN (SmallGridCNN)
  • Training steps: 1,000,000
  • Environment: Custom Gymnasium environment
  • Action space: 8,415 possible rectangles (masked)

Repository

Source code: https://github.com/your-username/alphaapple

Citation

@misc{alphaapple2024,
  title={AlphaApple: AI Agent for FruitBox Puzzle Game},
  author={Your Name},
  year={2024},
  howpublished={\url{https://huggingface.co/AlphaApple}}
}
Downloads last month
22
Video Preview
loading

Evaluation results