Adding `safetensors` variant of this model
#19 opened 12 months ago
by
SFconvertbot

Adding Evaluation Results
#18 opened about 1 year ago
by
leaderboard-pr-bot

any plans for mixtral 128k?
#17 opened about 1 year ago
by
sirus

Transformers fix to mixed precision at long context lengths
1
#16 opened about 1 year ago
by
nbroad

How much computation power(like gpus and gpu hour) you guys needed to finetune this?
1
#15 opened over 1 year ago
by
zohadev

Yarn-StableLM-Epoch?
#14 opened over 1 year ago
by
KnutJaegersberg

Instruction finetuning and train script, QLORA etc.
#13 opened over 1 year ago
by
aamir1122a
Add widget examples
#11 opened over 1 year ago
by
mishig

Using this model with Vllm
1
#10 opened over 1 year ago
by
haltux
Can't deploy to any provider an inference endpoint
2
#9 opened over 1 year ago
by
ejkkan
Pretraining from scratch?
#8 opened over 1 year ago
by
MengboZhou
Fine-tuned with all parameters?
1
#6 opened over 1 year ago
by
MengboZhou
VRAM usage for full 128k tokens
7
#5 opened over 1 year ago
by
Hypersniper

sliding_window = 131072? Sliding window attention doesn't work for 128?
1
#4 opened over 1 year ago
by
keyishen
smaller shards, pls
#2 opened over 1 year ago
by
lskywalker
Instruct Version?
8
#1 opened over 1 year ago
by
mrfakename
