Llama 2 13b is a pretty decent language model. You know what's probably better? Two Llama 2 13b models. In a trenchcoat.

Produced by bakllama.py with this config file:

layer_slices:
  - model: TheBloke/Llama-2-13B-fp16
    start: 0
    end: 40
  - model: TheBloke/Llama-2-13B-fp16
    start: 0
    end: 40

No fine tuning was done on this model. Yes, it's still coherent somehow.

Benchmark results:

Benchmark Llama2-13b Llama2-26b-tcs Percent Change
ARC 59.3 55.03 -7.2%
HellaSwag 82.15 79.9 -2.74%
MMLU 55.67 53.73 -3.48%
TruthfulQA 37.39 40.48 +5.59%
Average 58.63 57.29 -2.29%
Average Minus TQA 65.70 62.85 -4.34%

This tells us two very important things:

  1. TruthfulQA is a perfect benchmark in every way.
  2. Llama models are amazingly robust to being fed their own output.
Downloads last month
1,147
Safetensors
Model size
25.7B params
Tensor type
FP16
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Collection including chargoddard/llama-2-26b-trenchcoat-stack