agent-2048-game-qwen-7b-8k-ds

This model is a specialized game-playing AI trained to master the 2048 puzzle game using advanced reinforcement learning techniques. Based on the Qwen-7B architecture, it demonstrates sophisticated strategic planning and spatial reasoning capabilities.

Model Description

  • Base Model: Qwen-7B-Instruct
  • Training Approach: Group Relative Policy Optimization (GRPO)
  • Training Dataset: 8,000 carefully curated game states
  • Hardware Used: Single RTX 4090 (24GB)
  • Training Time: ~10 hours
  • Framework: Implemented using trl library and accelerated by Unsloth

Training Configuration

  • Learning Rate: 4e-5 (optimized after extensive testing)
  • LoRA Rank: 16
  • Max Sequence Length: 1000 tokens
  • Batch Size: 1 (with gradient accumulation steps of 4)
  • Optimizer: paged_adamw_8bit

Intended Use

This model is designed to play the 2048 game by:

  1. Analyzing the current board state
  2. Planning strategic moves
  3. Maximizing score and achieving high-value tiles
  4. Maintaining efficient board organization

Training Data

The training data was generated through a sophisticated pipeline:

  • Simulated gameplay for realistic board states
  • Custom difficulty scoring system
  • 5-level difficulty classification
  • Balanced sampling across difficulty levels
  • Parallel processing for efficient generation

Training Approach

Reward System

The model was trained using multiple reward components:

  1. Density Reward: Encourages efficient tile merging and space utilization
  2. Highest Tile Reward: Incentivizes creation of high-value tiles
  3. Survival Reward: Promotes moves that maintain game continuity
  4. Format Compliance: Ensures proper response structure

Optimization

  • Utilized Unsloth for 2x faster fine-tuning
  • 4-bit quantization for efficient training
  • Implemented efficient LoRA adaptation

Performance and Limitations

Strengths

  • Strong strategic planning capabilities
  • Efficient tile merging and space management
  • Consistent high-score achievement
  • Structured decision-making process

Limitations

  • Performance may vary with random seeds
  • Success not guaranteed due to game's inherent randomness
  • Model requires specific input formatting

Example Usage

# Format your 4x4 game board as a string
board_state = """
2 | 4 | 8 | 16
. | . | 2 | 4
. | . | . | 2
. | . | . | .
"""

# Model will output one of: up, down, left, right

Citation

@misc{dalal2024agent2048blog,
    author = {Dalal, Hrishbh},
    title = {Agent 2048: Forging Strategic Gameplay in an AI Through Data, Rewards, and RL},
    year = {2024},
    month = {March},
    url = {https://yourwebsite.com/blog/ai-agent-plays-2048},
    note = {[Blog post] Accessed: March 30, 2024}
}

Author

Hrishbh Dalal

Acknowledgments

Special thanks to the research community on Twitter/X for valuable feedback on data generation strategies and training approaches.

License

This model is released under the Apache 2.0 license.

Downloads last month
1
Safetensors
Model size
7.62B params
Tensor type
BF16
·
Video Preview
loading