System Prompt Learning: Teaching LLMs to Learn Problem-Solving Strategies from Experience

Community Article Published June 2, 2025

We're excited to announce System Prompt Learning (SPL), a new paradigm that enables Large Language Models to learn and improve their problem-solving capabilities through experience. This approach has been implemented as an open-source plugin in optillm, showing significant performance improvements across multiple benchmarks.

The Motivation: Bridging the System Prompt Gap

If you've ever wondered why ChatGPT, Claude, and other popular AI assistants seem so capable, part of the secret lies in their sophisticated system prompts. These prompts contain elaborate problem-solving strategies, reasoning frameworks, and detailed instructions that guide the models to better performance. However, most developers and researchers work with basic or empty system prompts, missing out on these benefits entirely.

This disparity inspired us to explore Andrej Karpathy's proposed "third paradigm" for LLM learning:

  1. Pretraining: Learning facts and patterns from massive text corpora
  2. Finetuning: Learning behaviors through supervised/reinforcement learning
  3. System Prompt Learning: Learning explicit problem-solving strategies through experience โ† NEW

What is System Prompt Learning?

System Prompt Learning represents a fundamental shift in how LLMs approach problem-solving. Instead of treating each query as an isolated challenge, SPL enables models to:

  • Learn from Experience: Build a knowledge base of effective problem-solving strategies
  • Classify Problems: Automatically categorize queries into specific problem types
  • Apply Relevant Strategies: Select and apply the most effective strategies for each problem type
  • Improve Over Time: Refine strategies based on success rates and new examples
  • Maintain Transparency: Generate human-readable strategies that can be inspected and understood

Impressive Results

We evaluated SPL using gemini-2.0-flash-lite across multiple benchmarks, with the learning phase using 400 training instances and evaluation on separate test sets:

Benchmark Baseline With SPL Improvement
OptILLMBench 61% 65% +4%
MATH-500 85% 85.6% +0.6%
Arena Auto Hard 29% 37.6% +8.6%
AIME24 23.33% 30% +6.67%

The improvements are particularly notable for challenging benchmarks like Arena Auto Hard and AIME24, where strategic problem-solving approaches make the biggest difference.

How It Works

The SPL system maintains a dynamic database of problem-solving strategies that evolves over time:

1. Problem Classification

Every query is automatically classified into one of 16 problem types (arithmetic, word problems, logical reasoning, coding, etc.)

2. Strategy Management

  • Creation: Generate new strategies for unfamiliar problem types
  • Selection: Choose the most relevant strategies (up to 3) for inference
  • Evaluation: Assess strategy effectiveness after each use
  • Refinement: Improve strategies every 10 applications
  • Maintenance: Merge similar strategies and prune poor performers

3. System Prompt Augmentation

Selected strategies are integrated into the system prompt, providing the model with explicit guidance on how to approach the problem.

Example Strategy

Here's a refined strategy the system learned for word problems:

**Strategy for Solving Word Problems:**

1. **Understand:**
   * Read the problem carefully (multiple times)
   * Identify the question (what are you trying to find?)
   * List all given information (facts, numbers, units)

2. **Plan and Translate:**
   * Define all variables with units
   * Identify relationships between knowns and unknowns
   * Write equations or expressions
   * Ensure units are consistent throughout

3. **Solve:**
   * Show work step-by-step
   * Track units throughout calculations
   * Calculate accurately

4. **Verify:**
   * Check if the answer is reasonable
   * State the final answer with units

After 500 training queries, our system developed:

  • 129 strategies created
  • 97 strategies refined
  • 28 strategies merged
  • 346 successful resolutions

Getting Started

SPL is implemented as a plugin in optillm, making it easy to integrate with existing workflows:

Installation

pip install optillm

Basic Usage (Inference Mode)

from openai import OpenAI

client = OpenAI(
    api_key="your-api-key",
    base_url="http://localhost:8000/v1"  # optillm proxy
)

response = client.chat.completions.create(
    model="spl-gpt-4o",  # SPL prefix enables the plugin
    messages=[
        {"role": "user", "content": "Your challenging problem here"}
    ]
)

Learning Mode (Strategy Creation/Refinement)

response = client.chat.completions.create(
    model="spl-gpt-4o",
    messages=[
        {"role": "user", "content": "Your problem here"}
    ],
    extra_body={"spl_learning": True}  # Enable learning mode
)

Combining with Other Techniques

# Combine SPL with other optillm techniques
response = client.chat.completions.create(
    model="spl&memory-gpt-4o",  # SPL + memory plugin
    messages=[...]
)

Key Benefits

๐Ÿง  Cumulative Learning: The LLM improves on specific problem types over time

๐Ÿ“– Transparent Knowledge: Strategies are human-readable and provide insight into reasoning

โšก Efficiency: Reuses successful approaches rather than solving each problem from scratch

๐ŸŽฏ Adaptability: Different strategies for different problem types

๐Ÿ” Inspectable: Learning process and outcomes can be examined and understood

Implementation Details

The complete implementation is available in the optillm repository. Key components include:

  • Strategy Database: JSON-based persistent storage
  • Problem Classifier: Automatic query categorization
  • Strategy Generator: LLM-powered strategy creation
  • Effectiveness Evaluator: Post-completion strategy assessment
  • Strategy Refiner: Continuous improvement of existing strategies

Future Implications

System Prompt Learning opens exciting possibilities for AI development:

  • Domain-Specific Expertise: Models that develop specialized knowledge in particular fields
  • Collaborative Learning: Sharing strategy databases across different deployments
  • Human-AI Collaboration: Allowing human experts to contribute and refine strategies
  • Multimodal Strategies: Extending the approach beyond text to include visual and other modalities

Try It Today

Ready to give your LLM the ability to learn from experience?

๐Ÿ”— GitHub Repository: https://github.com/codelion/optillm
๐Ÿ“ SPL Plugin: https://github.com/codelion/optillm/tree/main/optillm/plugins/spl
๐Ÿ“„ Documentation: Complete setup and usage guide in the repository

We believe System Prompt Learning represents a fundamental step toward more intelligent, adaptive AI systems. By enabling models to learn from their experiences in a transparent, interpretable way, we're moving closer to AI that truly improves over time.

What strategies will your LLM learn? Try SPL today and find out!


System Prompt Learning is implemented in optillm, an open-source project focused on optimizing LLM inference through state-of-the-art techniques. Join our community and help shape the future of adaptive AI systems.

Tags: #MachineLearning #AI #LLM #ProblemSolving #OpenSource #InferenceOptimization

Community

Sign up or log in to comment