Spaces:
Running
Running
# Model Performance Testing Methodology | |
This document outlines the methodology used for testing various LLM models through Ollama on a GPU Poor setup. | |
## Hardware Specifications | |
### GPU | |
- Model: AMD Radeon RX 7600 XT 16GB | |
- Note: Currently the most affordable (GPU-poorest) graphics card with 16GB VRAM on the market, making it an excellent choice for budget-conscious AI enthusiasts | |
### System Specifications | |
- CPU: AMD Ryzen 7 5700X (16) @ 4.66 GHz | |
- Motherboard: B550 Pro4 | |
- RAM: 64GB | |
- OS: Debian 12 Bookworm | |
- Kernel: Linux 6.8.12-8 | |
- Testing Environment: Ollama with ROCm backend | |
## Testing Methodology | |
Each model is tested using a consistent creative writing prompt designed to evaluate both the model's performance and creative capabilities. The testing process includes: | |
1. Model Loading: Each model is loaded fresh before testing | |
2. Initial Warmup: A small test prompt is run to ensure model is properly loaded | |
3. Main Test: A comprehensive creative writing prompt is processed | |
4. Performance Metrics Collection: Various metrics are gathered during generation | |
### Test Prompt | |
The following creative writing prompt is used to test all models: | |
``` | |
You are a creative writing assistant. Write a short story about a futuristic city where: | |
1. The city is powered by a mysterious energy source | |
2. The inhabitants have developed unique abilities | |
3. There's a hidden conflict between different factions | |
4. The protagonist discovers a shocking truth about the city's origins | |
Make the story engaging and include vivid descriptions of the city's architecture and technology. | |
``` | |
This prompt was chosen because it: | |
- Requires creative thinking and complex reasoning | |
- Generates substantial output (typically 500-1000 tokens) | |
- Tests both context understanding and generation capabilities | |
- Produces consistent length outputs for fair comparison | |
## Metrics Collected | |
For each model, we collect and analyze: | |
1. Performance Metrics: | |
- Tokens per second (overall) | |
- Generation tokens per second | |
- Total response time | |
- Total tokens generated | |
2. Resource Usage: | |
- VRAM usage | |
- Model size | |
- Parameter count | |
3. Model Information: | |
- Quantization level | |
- Model format | |
- Model family | |
## Testing Parameters | |
All tests are run with consistent generation parameters: | |
- Temperature: 0.7 | |
- Top P: 0.9 | |
- Top K: 40 | |
- Max Tokens: 1000 | |
- Repetition Penalty: 1.0 | |
- Seed: 42 (for reproducibility) | |
## Notes | |
- Tests are run sequentially to ensure no resource contention | |
- A 3-second cooldown period is maintained between tests | |
- Models are unloaded after each test to ensure clean state | |
- Results are saved both in detailed and summary formats | |
- The testing script automatically handles model pulling and cleanup |