Spaces:

k-mktr
/

gpu-poor-llm-arena

Running

App Files Files Community

gpu-poor-llm-arena / README_model_stats.md

k-mktr

Update README_model_stats.md

8104655 verified about 1 month ago

preview code

raw

history blame contribute delete

2.74 kB

	# Model Performance Testing Methodology

	This document outlines the methodology used for testing various LLM models through Ollama on a GPU Poor setup.

	## Hardware Specifications

	### GPU
	- Model: AMD Radeon RX 7600 XT 16GB
	- Note: Currently the most affordable (GPU-poorest) graphics card with 16GB VRAM on the market, making it an excellent choice for budget-conscious AI enthusiasts

	### System Specifications
	- CPU: AMD Ryzen 7 5700X (16) @ 4.66 GHz
	- Motherboard: B550 Pro4
	- RAM: 64GB
	- OS: Debian 12 Bookworm
	- Kernel: Linux 6.8.12-8
	- Testing Environment: Ollama with ROCm backend

	## Testing Methodology

	Each model is tested using a consistent creative writing prompt designed to evaluate both the model's performance and creative capabilities. The testing process includes:

	1. Model Loading: Each model is loaded fresh before testing
	2. Initial Warmup: A small test prompt is run to ensure model is properly loaded
	3. Main Test: A comprehensive creative writing prompt is processed
	4. Performance Metrics Collection: Various metrics are gathered during generation

	### Test Prompt

	The following creative writing prompt is used to test all models:

	```
	You are a creative writing assistant. Write a short story about a futuristic city where:
	1. The city is powered by a mysterious energy source
	2. The inhabitants have developed unique abilities
	3. There's a hidden conflict between different factions
	4. The protagonist discovers a shocking truth about the city's origins

	Make the story engaging and include vivid descriptions of the city's architecture and technology.
	```

	This prompt was chosen because it:
	- Requires creative thinking and complex reasoning
	- Generates substantial output (typically 500-1000 tokens)
	- Tests both context understanding and generation capabilities
	- Produces consistent length outputs for fair comparison

	## Metrics Collected

	For each model, we collect and analyze:

	1. Performance Metrics:
	- Tokens per second (overall)
	- Generation tokens per second
	- Total response time
	- Total tokens generated

	2. Resource Usage:
	- VRAM usage
	- Model size
	- Parameter count

	3. Model Information:
	- Quantization level
	- Model format
	- Model family

	## Testing Parameters

	All tests are run with consistent generation parameters:
	- Temperature: 0.7
	- Top P: 0.9
	- Top K: 40
	- Max Tokens: 1000
	- Repetition Penalty: 1.0
	- Seed: 42 (for reproducibility)

	## Notes

	- Tests are run sequentially to ensure no resource contention
	- A 3-second cooldown period is maintained between tests
	- Models are unloaded after each test to ensure clean state
	- Results are saved both in detailed and summary formats
	- The testing script automatically handles model pulling and cleanup