# Model Performance Testing Methodology This document outlines the methodology used for testing various LLM models through Ollama on a GPU Poor setup. ## Hardware Specifications - GPU: AMD Radeon RX 7600 XT 16GB - Testing Environment: Ollama with ROCm backend ## Testing Methodology Each model is tested using a consistent creative writing prompt designed to evaluate both the model's performance and creative capabilities. The testing process includes: 1. Model Loading: Each model is loaded fresh before testing 2. Initial Warmup: A small test prompt is run to ensure model is properly loaded 3. Main Test: A comprehensive creative writing prompt is processed 4. Performance Metrics Collection: Various metrics are gathered during generation ### Test Prompt The following creative writing prompt is used to test all models: ``` You are a creative writing assistant. Write a short story about a futuristic city where: 1. The city is powered by a mysterious energy source 2. The inhabitants have developed unique abilities 3. There's a hidden conflict between different factions 4. The protagonist discovers a shocking truth about the city's origins Make the story engaging and include vivid descriptions of the city's architecture and technology. ``` This prompt was chosen because it: - Requires creative thinking and complex reasoning - Generates substantial output (typically 500-1000 tokens) - Tests both context understanding and generation capabilities - Produces consistent length outputs for fair comparison ## Metrics Collected For each model, we collect and analyze: 1. Performance Metrics: - Tokens per second (overall) - Generation tokens per second - Total response time - Total tokens generated 2. Resource Usage: - VRAM usage - Model size - Parameter count 3. Model Information: - Quantization level - Model format - Model family ## Testing Parameters All tests are run with consistent generation parameters: - Temperature: 0.7 - Top P: 0.9 - Top K: 40 - Max Tokens: 1000 - Repetition Penalty: 1.0 - Seed: 42 (for reproducibility) ## Notes - Tests are run sequentially to ensure no resource contention - A 3-second cooldown period is maintained between tests - Models are unloaded after each test to ensure clean state - Results are saved both in detailed and summary formats - The testing script automatically handles model pulling and cleanup