
Modelos para placa de vídeo RTX 3060 12GB - Uso Geral
Aqui apenas modelos que rodam completamente na placa vídeo.
- Text Generation • 8B • Updated • 291 • 4
mradermacher/Qwen3-8B-192k-Context-6X-Josiefied-Uncensored-i1-GGUF
8B • Updated • 1.13k • 3Note Context length set to 20 000 tokens (maximum supported: 196 608) GPU offload fully enabled (36/36) CPU thread pool size configured to 6 threads Evaluation batch size: 32 RoPE frequency base and scale both disabled (auto mode) KV cache offloaded to GPU memory Model kept resident in system RAM Memory-mapped I/O (mmap) enabled Random seed left unset (using a random seed) Flash Attention enabled K-cache quantization: disabled V-cache quantization: disabled Q6_K ~30.49 tok/sec
reedmayhew/Grok-3-reasoning-gemma3-12B-distilled-HF
Text Generation • Updated • 13 • 5Note Context length set to 12 000 tokens (supports up to 131 072) GPU offload fully enabled (48/48) CPU thread pool size: 4 Evaluation batch size: 32 RoPE frequency base and scale: disabled KV cache offloaded to GPU memory: enabled Model kept resident in system RAM: enabled mmap(): disabled Random seed: not fixed Experimental Flash Attention: enabled Experimental K-cache and V-cache quantization: disabled Q4_K_M ~20.61 tok/sec
Menlo/Jan-nano-gguf
Text Generation • 4B • Updated • 8.38k • 140
mradermacher/SOLAR-10.7B-Instruct-v1.0-uncensored-GGUF
11B • Updated • 128 • 3Note Context length set to 4 096 tokens (model supports up to 4 996) GPU offload fully enabled (48/48) CPU thread-pool size: 6 Evaluation batch size: 512 RoPE frequency base and scale: disabled (auto) KV cache offloaded to GPU memory: enabled Model kept resident in system RAM: enabled Memory-mapped I/O (mmap) enabled Random seed: not fixed Flash Attention enabled (experimental) K-cache quantization: disabled V-cache quantization: disabled Q6_k ~30 tok/sec
Mungert/Mistral-7B-Instruct-v0.3-GGUF
7B • Updated • 637 • 6janhq/Jan-v1-4B-GGUF
Text Generation • 4B • Updated • 81.9k • 119