System Prompt Learning: Teaching LLMs to Learn Problem-Solving Strategies from Experience

Community Article Published June 2, 2025

We're excited to announce System Prompt Learning (SPL), a new paradigm that enables Large Language Models to learn and improve their problem-solving capabilities through experience. This approach has been implemented as an open-source plugin in optillm, showing significant performance improvements across multiple benchmarks.

The Motivation: Bridging the System Prompt Gap

If you've ever wondered why ChatGPT, Claude, and other popular AI assistants seem so capable, part of the secret lies in their sophisticated system prompts. These prompts contain elaborate problem-solving strategies, reasoning frameworks, and detailed instructions that guide the models to better performance. However, most developers and researchers work with basic or empty system prompts, missing out on these benefits entirely.

This disparity inspired us to explore Andrej Karpathy's proposed "third paradigm" for LLM learning:

Pretraining: Learning facts and patterns from massive text corpora
Finetuning: Learning behaviors through supervised/reinforcement learning
System Prompt Learning: Learning explicit problem-solving strategies through experience ← NEW

What is System Prompt Learning?

System Prompt Learning represents a fundamental shift in how LLMs approach problem-solving. Instead of treating each query as an isolated challenge, SPL enables models to:

Learn from Experience: Build a knowledge base of effective problem-solving strategies
Classify Problems: Automatically categorize queries into specific problem types
Apply Relevant Strategies: Select and apply the most effective strategies for each problem type
Improve Over Time: Refine strategies based on success rates and new examples
Maintain Transparency: Generate human-readable strategies that can be inspected and understood

Impressive Results

We evaluated SPL using gemini-2.0-flash-lite across multiple benchmarks, with the learning phase using 400 training instances and evaluation on separate test sets:

Benchmark	Baseline	With SPL	Improvement
OptILLMBench	61%	65%	+4%
MATH-500	85%	85.6%	+0.6%
Arena Auto Hard	29%	37.6%	+8.6%
AIME24	23.33%	30%	+6.67%

The improvements are particularly notable for challenging benchmarks like Arena Auto Hard and AIME24, where strategic problem-solving approaches make the biggest difference.

How It Works

The SPL system maintains a dynamic database of problem-solving strategies that evolves over time:

1. Problem Classification

Every query is automatically classified into one of 16 problem types (arithmetic, word problems, logical reasoning, coding, etc.)

2. Strategy Management

Creation: Generate new strategies for unfamiliar problem types
Selection: Choose the most relevant strategies (up to 3) for inference
Evaluation: Assess strategy effectiveness after each use
Refinement: Improve strategies every 10 applications
Maintenance: Merge similar strategies and prune poor performers

3. System Prompt Augmentation

Selected strategies are integrated into the system prompt, providing the model with explicit guidance on how to approach the problem.

Example Strategy

Here's a refined strategy the system learned for word problems:

**Strategy for Solving Word Problems:**

1. **Understand:**
   * Read the problem carefully (multiple times)
   * Identify the question (what are you trying to find?)
   * List all given information (facts, numbers, units)

2. **Plan and Translate:**
   * Define all variables with units
   * Identify relationships between knowns and unknowns
   * Write equations or expressions
   * Ensure units are consistent throughout

3. **Solve:**
   * Show work step-by-step
   * Track units throughout calculations
   * Calculate accurately

4. **Verify:**
   * Check if the answer is reasonable
   * State the final answer with units

After 500 training queries, our system developed:

129 strategies created
97 strategies refined
28 strategies merged
346 successful resolutions

Getting Started

SPL is implemented as a plugin in optillm, making it easy to integrate with existing workflows:

Installation

pip install optillm

Basic Usage (Inference Mode)

from openai import OpenAI

client = OpenAI(
    api_key="your-api-key",
    base_url="http://localhost:8000/v1"  # optillm proxy
)

response = client.chat.completions.create(
    model="spl-gpt-4o",  # SPL prefix enables the plugin
    messages=[
        {"role": "user", "content": "Your challenging problem here"}
    ]
)

Learning Mode (Strategy Creation/Refinement)

response = client.chat.completions.create(
    model="spl-gpt-4o",
    messages=[
        {"role": "user", "content": "Your problem here"}
    ],
    extra_body={"spl_learning": True}  # Enable learning mode
)

Combining with Other Techniques

# Combine SPL with other optillm techniques
response = client.chat.completions.create(
    model="spl&memory-gpt-4o",  # SPL + memory plugin
    messages=[...]
)

Key Benefits

🧠 Cumulative Learning: The LLM improves on specific problem types over time

📖 Transparent Knowledge: Strategies are human-readable and provide insight into reasoning

⚡ Efficiency: Reuses successful approaches rather than solving each problem from scratch

🎯 Adaptability: Different strategies for different problem types

🔍 Inspectable: Learning process and outcomes can be examined and understood

Implementation Details

The complete implementation is available in the optillm repository. Key components include:

Strategy Database: JSON-based persistent storage
Problem Classifier: Automatic query categorization
Strategy Generator: LLM-powered strategy creation
Effectiveness Evaluator: Post-completion strategy assessment
Strategy Refiner: Continuous improvement of existing strategies

Future Implications

System Prompt Learning opens exciting possibilities for AI development:

Domain-Specific Expertise: Models that develop specialized knowledge in particular fields
Collaborative Learning: Sharing strategy databases across different deployments
Human-AI Collaboration: Allowing human experts to contribute and refine strategies
Multimodal Strategies: Extending the approach beyond text to include visual and other modalities

Try It Today

Ready to give your LLM the ability to learn from experience?

🔗 GitHub Repository: https://github.com/codelion/optillm
📁 SPL Plugin: https://github.com/codelion/optillm/tree/main/optillm/plugins/spl
📄 Documentation: Complete setup and usage guide in the repository

We believe System Prompt Learning represents a fundamental step toward more intelligent, adaptive AI systems. By enabling models to learn from their experiences in a transparent, interpretable way, we're moving closer to AI that truly improves over time.

What strategies will your LLM learn? Try SPL today and find out!

System Prompt Learning is implemented in optillm, an open-source project focused on optimizing LLM inference through state-of-the-art techniques. Join our community and help shape the future of adaptive AI systems.

Tags: #MachineLearning #AI #LLM #ProblemSolving #OpenSource #InferenceOptimization

Community

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote