Mistral-7b with continued pretraining using Quiet-STaR (https://arxiv.org/abs/2403.09629) for generating 8 thought tokens before each output token.

Downloads last month
99
Safetensors
Model size
7.29B params
Tensor type
BF16
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for ezelikman/quietstar-8-ahead

Merges
3 models
Quantizations
1 model

Dataset used to train ezelikman/quietstar-8-ahead

Spaces using ezelikman/quietstar-8-ahead 6