ryandt
/

MusingCaterpillar

Text Generation

text-generation-inference

Model card Files Files and versions Community

Finetune of CultriX/MistralTrix-v1 on Symbolic Logic content from Lewis Carrol (at a very low learning rate because of the very small dataset - I'm just experimenting and have no idea if this was effective at changing the model output).

Open LLM Leaderboard Evaluation Results

Detailed results can be found here

Metric	Value
Avg.	73.33
AI2 Reasoning Challenge (25-Shot)	72.53
HellaSwag (10-Shot)	88.34
MMLU (5-Shot)	65.26
TruthfulQA (0-shot)	70.93
Winogrande (5-shot)	80.66
GSM8k (5-shot)	62.24

Downloads last month: 332

Safetensors

Model size

8.99B params

Tensor type

FP16

·

Inference Providers NEW

Text Generation

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for ryandt/MusingCaterpillar

Quantizations

Dataset used to train ryandt/MusingCaterpillar

Spaces using ryandt/MusingCaterpillar 13

Evaluation results

normalized accuracy on AI2 Reasoning Challenge (25-Shot)
test set Open LLM Leaderboard

72.530
normalized accuracy on HellaSwag (10-Shot)
validation set Open LLM Leaderboard

88.340
accuracy on MMLU (5-Shot)
test set Open LLM Leaderboard

65.260
mc2 on TruthfulQA (0-shot)
validation set Open LLM Leaderboard

70.930
accuracy on Winogrande (5-shot)
validation set Open LLM Leaderboard

80.660
accuracy on GSM8k (5-shot)
test set Open LLM Leaderboard

62.240

View on Papers With Code