# Model Performance Testing Methodology

This document outlines the methodology used for testing various LLM models through Ollama on a GPU Poor setup.

## Hardware Specifications
- GPU: AMD Radeon RX 7600 XT 16GB
- Testing Environment: Ollama with ROCm backend

## Testing Methodology

Each model is tested using a consistent creative writing prompt designed to evaluate both the model's performance and creative capabilities. The testing process includes:

1. Model Loading: Each model is loaded fresh before testing
2. Initial Warmup: A small test prompt is run to ensure model is properly loaded
3. Main Test: A comprehensive creative writing prompt is processed
4. Performance Metrics Collection: Various metrics are gathered during generation

### Test Prompt

The following creative writing prompt is used to test all models:

```
You are a creative writing assistant. Write a short story about a futuristic city where:
1. The city is powered by a mysterious energy source
2. The inhabitants have developed unique abilities
3. There's a hidden conflict between different factions
4. The protagonist discovers a shocking truth about the city's origins

Make the story engaging and include vivid descriptions of the city's architecture and technology.
```

This prompt was chosen because it:
- Requires creative thinking and complex reasoning
- Generates substantial output (typically 500-1000 tokens)
- Tests both context understanding and generation capabilities
- Produces consistent length outputs for fair comparison

## Metrics Collected

For each model, we collect and analyze:

1. Performance Metrics:
   - Tokens per second (overall)
   - Generation tokens per second
   - Total response time
   - Total tokens generated

2. Resource Usage:
   - VRAM usage
   - Model size
   - Parameter count

3. Model Information:
   - Quantization level
   - Model format
   - Model family

## Testing Parameters

All tests are run with consistent generation parameters:
- Temperature: 0.7
- Top P: 0.9
- Top K: 40
- Max Tokens: 1000
- Repetition Penalty: 1.0
- Seed: 42 (for reproducibility)

## Notes

- Tests are run sequentially to ensure no resource contention
- A 3-second cooldown period is maintained between tests
- Models are unloaded after each test to ensure clean state
- Results are saved both in detailed and summary formats
- The testing script automatically handles model pulling and cleanup