GGUF
llama.cpp

flan-t5-xxl-gguf

This is a quantized version of google/flan-t5-xxl

Google Original Model Architecture

Usage/Examples

./llama-cli -m /path/to/file.gguf --prompt "your prompt" --n-gpu-layers nn

nn --> numbers of layers to offload to gpu

Quants

BITs TYPE
Q2 Q2_K
Q3 Q3_K, Q3_K_L, Q3_K_M, Q3_K_S
Q4 Q4_0, Q4_1, Q4_K, Q4_K_M, Q4_K_S
Q5 Q5_0, Q5_1, Q5_K, Q5_K_M, Q5_K_S
Q6 Q6_K
Q8 Q8_0

Additional:

BITs TYPE/float
16 f16
32 f32

Disclaimer

I don't claim any rights on this model. All rights go to google.

Acknowledgements

Downloads last month
4,397
GGUF
Model size
11.1B params
Architecture
t5
Hardware compatibility
Log In to view the estimation

2-bit

3-bit

4-bit

5-bit

6-bit

8-bit

16-bit

32-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for dumb-dev/flan-t5-xxl-gguf

Base model

google/flan-t5-xxl
Quantized
(1)
this model