agent-2048-game-qwen-7b-8k-ds
This model is a specialized game-playing AI trained to master the 2048 puzzle game using advanced reinforcement learning techniques. Based on the Qwen-7B architecture, it demonstrates sophisticated strategic planning and spatial reasoning capabilities.
Model Description
- Base Model: Qwen-7B-Instruct
- Training Approach: Group Relative Policy Optimization (GRPO)
- Training Dataset: 8,000 carefully curated game states
- Hardware Used: Single RTX 4090 (24GB)
- Training Time: ~10 hours
- Framework: Implemented using trl library and accelerated by Unsloth
Training Configuration
- Learning Rate: 4e-5 (optimized after extensive testing)
- LoRA Rank: 16
- Max Sequence Length: 1000 tokens
- Batch Size: 1 (with gradient accumulation steps of 4)
- Optimizer: paged_adamw_8bit
Intended Use
This model is designed to play the 2048 game by:
- Analyzing the current board state
- Planning strategic moves
- Maximizing score and achieving high-value tiles
- Maintaining efficient board organization
Training Data
The training data was generated through a sophisticated pipeline:
- Simulated gameplay for realistic board states
- Custom difficulty scoring system
- 5-level difficulty classification
- Balanced sampling across difficulty levels
- Parallel processing for efficient generation
Training Approach
Reward System
The model was trained using multiple reward components:
- Density Reward: Encourages efficient tile merging and space utilization
- Highest Tile Reward: Incentivizes creation of high-value tiles
- Survival Reward: Promotes moves that maintain game continuity
- Format Compliance: Ensures proper response structure
Optimization
- Utilized Unsloth for 2x faster fine-tuning
- 4-bit quantization for efficient training
- Implemented efficient LoRA adaptation
Performance and Limitations
Strengths
- Strong strategic planning capabilities
- Efficient tile merging and space management
- Consistent high-score achievement
- Structured decision-making process
Limitations
- Performance may vary with random seeds
- Success not guaranteed due to game's inherent randomness
- Model requires specific input formatting
Example Usage
# Format your 4x4 game board as a string
board_state = """
2 | 4 | 8 | 16
. | . | 2 | 4
. | . | . | 2
. | . | . | .
"""
# Model will output one of: up, down, left, right
Citation
@misc{dalal2024agent2048blog,
author = {Dalal, Hrishbh},
title = {Agent 2048: Forging Strategic Gameplay in an AI Through Data, Rewards, and RL},
year = {2024},
month = {March},
url = {https://yourwebsite.com/blog/ai-agent-plays-2048},
note = {[Blog post] Accessed: March 30, 2024}
}
Author
Hrishbh Dalal
Acknowledgments
Special thanks to the research community on Twitter/X for valuable feedback on data generation strategies and training approaches.
License
This model is released under the Apache 2.0 license.
- Downloads last month
- 1