k-mktr commited on
Commit
75a996e
·
verified ·
1 Parent(s): 5e4594f

Upload README_model_stats.md

Browse files
Files changed (1) hide show
  1. README_model_stats.md +74 -0
README_model_stats.md ADDED
@@ -0,0 +1,74 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Model Performance Testing Methodology
2
+
3
+ This document outlines the methodology used for testing various LLM models through Ollama on a GPU Poor setup.
4
+
5
+ ## Hardware Specifications
6
+ - GPU: AMD Radeon RX 7600 XT 16GB
7
+ - Testing Environment: Ollama with ROCm backend
8
+
9
+ ## Testing Methodology
10
+
11
+ Each model is tested using a consistent creative writing prompt designed to evaluate both the model's performance and creative capabilities. The testing process includes:
12
+
13
+ 1. Model Loading: Each model is loaded fresh before testing
14
+ 2. Initial Warmup: A small test prompt is run to ensure model is properly loaded
15
+ 3. Main Test: A comprehensive creative writing prompt is processed
16
+ 4. Performance Metrics Collection: Various metrics are gathered during generation
17
+
18
+ ### Test Prompt
19
+
20
+ The following creative writing prompt is used to test all models:
21
+
22
+ ```
23
+ You are a creative writing assistant. Write a short story about a futuristic city where:
24
+ 1. The city is powered by a mysterious energy source
25
+ 2. The inhabitants have developed unique abilities
26
+ 3. There's a hidden conflict between different factions
27
+ 4. The protagonist discovers a shocking truth about the city's origins
28
+
29
+ Make the story engaging and include vivid descriptions of the city's architecture and technology.
30
+ ```
31
+
32
+ This prompt was chosen because it:
33
+ - Requires creative thinking and complex reasoning
34
+ - Generates substantial output (typically 500-1000 tokens)
35
+ - Tests both context understanding and generation capabilities
36
+ - Produces consistent length outputs for fair comparison
37
+
38
+ ## Metrics Collected
39
+
40
+ For each model, we collect and analyze:
41
+
42
+ 1. Performance Metrics:
43
+ - Tokens per second (overall)
44
+ - Generation tokens per second
45
+ - Total response time
46
+ - Total tokens generated
47
+
48
+ 2. Resource Usage:
49
+ - VRAM usage
50
+ - Model size
51
+ - Parameter count
52
+
53
+ 3. Model Information:
54
+ - Quantization level
55
+ - Model format
56
+ - Model family
57
+
58
+ ## Testing Parameters
59
+
60
+ All tests are run with consistent generation parameters:
61
+ - Temperature: 0.7
62
+ - Top P: 0.9
63
+ - Top K: 40
64
+ - Max Tokens: 1000
65
+ - Repetition Penalty: 1.0
66
+ - Seed: 42 (for reproducibility)
67
+
68
+ ## Notes
69
+
70
+ - Tests are run sequentially to ensure no resource contention
71
+ - A 3-second cooldown period is maintained between tests
72
+ - Models are unloaded after each test to ensure clean state
73
+ - Results are saved both in detailed and summary formats
74
+ - The testing script automatically handles model pulling and cleanup