agent-2048-game-qwen-7b-8k-ds

This model is a specialized game-playing AI trained to master the 2048 puzzle game using advanced reinforcement learning techniques. Based on the Qwen-7B architecture, it demonstrates sophisticated strategic planning and spatial reasoning capabilities.

Model Description

Base Model: Qwen-7B-Instruct
Training Approach: Group Relative Policy Optimization (GRPO)
Training Dataset: 8,000 carefully curated game states
Hardware Used: Single RTX 4090 (24GB)
Training Time: ~10 hours
Framework: Implemented using trl library and accelerated by Unsloth

Training Configuration

Learning Rate: 4e-5 (optimized after extensive testing)
LoRA Rank: 16
Max Sequence Length: 1000 tokens
Batch Size: 1 (with gradient accumulation steps of 4)
Optimizer: paged_adamw_8bit

Intended Use

This model is designed to play the 2048 game by:

Analyzing the current board state
Planning strategic moves
Maximizing score and achieving high-value tiles
Maintaining efficient board organization

Training Data

The training data was generated through a sophisticated pipeline:

Simulated gameplay for realistic board states
Custom difficulty scoring system
5-level difficulty classification
Balanced sampling across difficulty levels
Parallel processing for efficient generation

Training Approach

Reward System

The model was trained using multiple reward components:

Density Reward: Encourages efficient tile merging and space utilization
Highest Tile Reward: Incentivizes creation of high-value tiles
Survival Reward: Promotes moves that maintain game continuity
Format Compliance: Ensures proper response structure

Optimization

Utilized Unsloth for 2x faster fine-tuning
4-bit quantization for efficient training
Implemented efficient LoRA adaptation

Performance and Limitations

Strengths

Strong strategic planning capabilities
Efficient tile merging and space management
Consistent high-score achievement
Structured decision-making process

Limitations

Performance may vary with random seeds
Success not guaranteed due to game's inherent randomness
Model requires specific input formatting

Example Usage

# Format your 4x4 game board as a string
board_state = """
2 | 4 | 8 | 16
. | . | 2 | 4
. | . | . | 2
. | . | . | .
"""

# Model will output one of: up, down, left, right

Citation

@misc{dalal2024agent2048blog,
    author = {Dalal, Hrishbh},
    title = {Agent 2048: Forging Strategic Gameplay in an AI Through Data, Rewards, and RL},
    year = {2024},
    month = {March},
    url = {https://yourwebsite.com/blog/ai-agent-plays-2048},
    note = {[Blog post] Accessed: March 30, 2024}
}

Author

Hrishbh Dalal

Acknowledgments

Special thanks to the research community on Twitter/X for valuable feedback on data generation strategies and training approaches.

License

This model is released under the Apache 2.0 license.