Spaces:

k-mktr
/

gpu-poor-llm-arena

Running

App Files Files Community

k-mktr commited on Mar 14

Commit

75a996e

verified ·

1 Parent(s): 5e4594f

Upload README_model_stats.md

Browse files

Files changed (1) hide show

README_model_stats.md +74 -0

README_model_stats.md ADDED Viewed

	@@ -0,0 +1,74 @@

+# Model Performance Testing Methodology
+This document outlines the methodology used for testing various LLM models through Ollama on a GPU Poor setup.
+## Hardware Specifications
+- GPU: AMD Radeon RX 7600 XT 16GB
+- Testing Environment: Ollama with ROCm backend
+## Testing Methodology
+Each model is tested using a consistent creative writing prompt designed to evaluate both the model's performance and creative capabilities. The testing process includes:
+1. Model Loading: Each model is loaded fresh before testing
+2. Initial Warmup: A small test prompt is run to ensure model is properly loaded
+3. Main Test: A comprehensive creative writing prompt is processed
+4. Performance Metrics Collection: Various metrics are gathered during generation
+### Test Prompt
+The following creative writing prompt is used to test all models:
+```
+You are a creative writing assistant. Write a short story about a futuristic city where:
+1. The city is powered by a mysterious energy source
+2. The inhabitants have developed unique abilities
+3. There's a hidden conflict between different factions
+4. The protagonist discovers a shocking truth about the city's origins
+Make the story engaging and include vivid descriptions of the city's architecture and technology.
+```
+This prompt was chosen because it:
+- Requires creative thinking and complex reasoning
+- Generates substantial output (typically 500-1000 tokens)
+- Tests both context understanding and generation capabilities
+- Produces consistent length outputs for fair comparison
+## Metrics Collected
+For each model, we collect and analyze:
+1. Performance Metrics:
+   - Tokens per second (overall)
+   - Generation tokens per second
+   - Total response time
+   - Total tokens generated
+2. Resource Usage:
+   - VRAM usage
+   - Model size
+   - Parameter count
+3. Model Information:
+   - Quantization level
+   - Model format
+   - Model family
+## Testing Parameters
+All tests are run with consistent generation parameters:
+- Temperature: 0.7
+- Top P: 0.9
+- Top K: 40
+- Max Tokens: 1000
+- Repetition Penalty: 1.0
+- Seed: 42 (for reproducibility)
+## Notes
+- Tests are run sequentially to ensure no resource contention
+- A 3-second cooldown period is maintained between tests
+- Models are unloaded after each test to ensure clean state
+- Results are saved both in detailed and summary formats
+- The testing script automatically handles model pulling and cleanup