Spaces:
Running
Running
Upload README_model_stats.md
Browse files- README_model_stats.md +74 -0
README_model_stats.md
ADDED
@@ -0,0 +1,74 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
# Model Performance Testing Methodology
|
2 |
+
|
3 |
+
This document outlines the methodology used for testing various LLM models through Ollama on a GPU Poor setup.
|
4 |
+
|
5 |
+
## Hardware Specifications
|
6 |
+
- GPU: AMD Radeon RX 7600 XT 16GB
|
7 |
+
- Testing Environment: Ollama with ROCm backend
|
8 |
+
|
9 |
+
## Testing Methodology
|
10 |
+
|
11 |
+
Each model is tested using a consistent creative writing prompt designed to evaluate both the model's performance and creative capabilities. The testing process includes:
|
12 |
+
|
13 |
+
1. Model Loading: Each model is loaded fresh before testing
|
14 |
+
2. Initial Warmup: A small test prompt is run to ensure model is properly loaded
|
15 |
+
3. Main Test: A comprehensive creative writing prompt is processed
|
16 |
+
4. Performance Metrics Collection: Various metrics are gathered during generation
|
17 |
+
|
18 |
+
### Test Prompt
|
19 |
+
|
20 |
+
The following creative writing prompt is used to test all models:
|
21 |
+
|
22 |
+
```
|
23 |
+
You are a creative writing assistant. Write a short story about a futuristic city where:
|
24 |
+
1. The city is powered by a mysterious energy source
|
25 |
+
2. The inhabitants have developed unique abilities
|
26 |
+
3. There's a hidden conflict between different factions
|
27 |
+
4. The protagonist discovers a shocking truth about the city's origins
|
28 |
+
|
29 |
+
Make the story engaging and include vivid descriptions of the city's architecture and technology.
|
30 |
+
```
|
31 |
+
|
32 |
+
This prompt was chosen because it:
|
33 |
+
- Requires creative thinking and complex reasoning
|
34 |
+
- Generates substantial output (typically 500-1000 tokens)
|
35 |
+
- Tests both context understanding and generation capabilities
|
36 |
+
- Produces consistent length outputs for fair comparison
|
37 |
+
|
38 |
+
## Metrics Collected
|
39 |
+
|
40 |
+
For each model, we collect and analyze:
|
41 |
+
|
42 |
+
1. Performance Metrics:
|
43 |
+
- Tokens per second (overall)
|
44 |
+
- Generation tokens per second
|
45 |
+
- Total response time
|
46 |
+
- Total tokens generated
|
47 |
+
|
48 |
+
2. Resource Usage:
|
49 |
+
- VRAM usage
|
50 |
+
- Model size
|
51 |
+
- Parameter count
|
52 |
+
|
53 |
+
3. Model Information:
|
54 |
+
- Quantization level
|
55 |
+
- Model format
|
56 |
+
- Model family
|
57 |
+
|
58 |
+
## Testing Parameters
|
59 |
+
|
60 |
+
All tests are run with consistent generation parameters:
|
61 |
+
- Temperature: 0.7
|
62 |
+
- Top P: 0.9
|
63 |
+
- Top K: 40
|
64 |
+
- Max Tokens: 1000
|
65 |
+
- Repetition Penalty: 1.0
|
66 |
+
- Seed: 42 (for reproducibility)
|
67 |
+
|
68 |
+
## Notes
|
69 |
+
|
70 |
+
- Tests are run sequentially to ensure no resource contention
|
71 |
+
- A 3-second cooldown period is maintained between tests
|
72 |
+
- Models are unloaded after each test to ensure clean state
|
73 |
+
- Results are saved both in detailed and summary formats
|
74 |
+
- The testing script automatically handles model pulling and cleanup
|