'Make knowledge free for everyone'

Model

Unsloth 4bit qunatized Llama3 8B Fine-Tuned for Analytical reasoning.

Eval

Tasks Version Filter n-shot Metric Value Stderr
hellaswag 1 none 0 acc 0.5704 ± 0.0049
none 0 acc_norm 0.7526 ± 0.0043
leaderboard_bbh N/A
- leaderboard_bbh_boolean_expressions 1 none 3 acc_norm 0.7480 ± 0.0275
- leaderboard_bbh_causal_judgement 1 none 3 acc_norm 0.6524 ± 0.0349
- leaderboard_bbh_date_understanding 1 none 3 acc_norm 0.4400 ± 0.0315
- leaderboard_bbh_disambiguation_qa 1 none 3 acc_norm 0.6240 ± 0.0307
- leaderboard_bbh_formal_fallacies 1 none 3 acc_norm 0.5440 ± 0.0316
- leaderboard_bbh_geometric_shapes 1 none 3 acc_norm 0.3160 ± 0.0295
- leaderboard_bbh_hyperbaton 1 none 3 acc_norm 0.4920 ± 0.0317
- leaderboard_bbh_logical_deduction_five_objects 1 none 3 acc_norm 0.4040 ± 0.0311
- leaderboard_bbh_logical_deduction_seven_objects 1 none 3 acc_norm 0.4200 ± 0.0313
- leaderboard_bbh_logical_deduction_three_objects 1 none 3 acc_norm 0.5360 ± 0.0316
- leaderboard_bbh_movie_recommendation 1 none 3 acc_norm 0.5920 ± 0.0311
- leaderboard_bbh_navigate 1 none 3 acc_norm 0.5000 ± 0.0317
- leaderboard_bbh_object_counting 1 none 3 acc_norm 0.4720 ± 0.0316
- leaderboard_bbh_penguins_in_a_table 1 none 3 acc_norm 0.5274 ± 0.0415
- leaderboard_bbh_reasoning_about_colored_objects 1 none 3 acc_norm 0.5480 ± 0.0315
- leaderboard_bbh_ruin_names 1 none 3 acc_norm 0.5640 ± 0.0314
- leaderboard_bbh_salient_translation_error_detection 1 none 3 acc_norm 0.4640 ± 0.0316
- leaderboard_bbh_snarks 1 none 3 acc_norm 0.4831 ± 0.0376
- leaderboard_bbh_sports_understanding 1 none 3 acc_norm 0.6200 ± 0.0308
- leaderboard_bbh_temporal_sequences 1 none 3 acc_norm 0.1720 ± 0.0239
- leaderboard_bbh_tracking_shuffled_objects_five_objects 1 none 3 acc_norm 0.2000 ± 0.0253
- leaderboard_bbh_tracking_shuffled_objects_seven_objects 1 none 3 acc_norm 0.1840 ± 0.0246
- leaderboard_bbh_tracking_shuffled_objects_three_objects 1 none 3 acc_norm 0.3440 ± 0.0301
- leaderboard_bbh_web_of_lies 1 none 3 acc_norm 0.5280 ± 0.0316
leaderboard_gpqa N/A
- leaderboard_gpqa_diamond 1 none 0 acc_norm 0.2727 ± 0.0317
- leaderboard_gpqa_extended 1 none 0 acc_norm 0.3095 ± 0.0198
- leaderboard_gpqa_main 1 none 0 acc_norm 0.3058 ± 0.0218
leaderboard_ifeval 3 none 0 inst_level_loose_acc 0.6031 ± N/A
none 0 inst_level_strict_acc 0.5132 ± N/A
none 0 prompt_level_loose_acc 0.4677 ± 0.0215
none 0 prompt_level_strict_acc 0.3660 ± 0.0207
leaderboard_math_hard N/A
- leaderboard_math_algebra_hard 2 none 4 exact_match 0.1303 ± 0.0192
- leaderboard_math_counting_and_prob_hard 2 none 4 exact_match 0.0244 ± 0.0140
- leaderboard_math_geometry_hard 2 none 4 exact_match 0.0303 ± 0.0150
- leaderboard_math_intermediate_algebra_hard 2 none 4 exact_match 0.0179 ± 0.0079
- leaderboard_math_num_theory_hard 2 none 4 exact_match 0.0714 ± 0.0208
- leaderboard_math_prealgebra_hard 2 none 4 exact_match 0.1451 ± 0.0254
- leaderboard_math_precalculus_hard 2 none 4 exact_match 0.0296 ± 0.0146
leaderboard_mmlu_pro 0.1 none 5 acc 0.3481 ± 0.0043
leaderboard_musr N/A
- leaderboard_musr_murder_mysteries 1 none 0 acc_norm 0.5480 ± 0.0315
- leaderboard_musr_object_placements 1 none 0 acc_norm 0.2617 ± 0.0275
- leaderboard_musr_team_allocation 1 none 0 acc_norm 0.3640 ± 0.0305

Made with

Buy Me a Coffee at ko-fi.com

Downloads last month
5
Safetensors
Model size
4.65B params
Tensor type
FP16
·
F32
·
U8
·
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no library tag.

Model tree for DevQuasar/analytical_reasoning_Llama-3-8B

Quantized
(556)
this model

Dataset used to train DevQuasar/analytical_reasoning_Llama-3-8B

Collection including DevQuasar/analytical_reasoning_Llama-3-8B