Spaces:
Running
Running
add blog link
Browse files
ZeroEval-main/result_dirs/zebra-grid.summary.json
CHANGED
|
@@ -372,16 +372,5 @@
|
|
| 372 |
"Hard Puzzle Acc": "0.00",
|
| 373 |
"Total Puzzles": 1000,
|
| 374 |
"Reason Lens": "1592.60"
|
| 375 |
-
},
|
| 376 |
-
{
|
| 377 |
-
"Model": "gemma-2-27b-it@vllm",
|
| 378 |
-
"Mode": "greedy",
|
| 379 |
-
"Puzzle Acc": "0.47",
|
| 380 |
-
"Cell Acc": "0.31",
|
| 381 |
-
"No answer": "96.23",
|
| 382 |
-
"Easy Puzzle Acc": "2.08",
|
| 383 |
-
"Hard Puzzle Acc": "0.00",
|
| 384 |
-
"Total Puzzles": 212,
|
| 385 |
-
"Reason Lens": "1280.62"
|
| 386 |
}
|
| 387 |
]
|
|
|
|
| 372 |
"Hard Puzzle Acc": "0.00",
|
| 373 |
"Total Puzzles": 1000,
|
| 374 |
"Reason Lens": "1592.60"
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 375 |
}
|
| 376 |
]
|
_header.md
CHANGED
|
@@ -2,5 +2,5 @@
|
|
| 2 |
|
| 3 |
# π¦ ZebraLogic: Benchmarking the Logical Reasoning Ability of Language Models
|
| 4 |
<!-- [π FnF Paper](https://arxiv.org/abs/2305.18654) | -->
|
| 5 |
-
[π° Blog]() [π» GitHub](https://github.com/yuchenlin/ZeroEval) | [π€ HuggingFace](https://huggingface.co/collections/allenai/zebra-logic-bench-6697137cbaad0b91e635e7b0) | [π¦ X](https://twitter.com/billyuchenlin/) | [π¬ Discussion](https://huggingface.co/spaces/allenai/ZebraLogicBench-Leaderboard/discussions) | Updated: **{LAST_UPDATED}**
|
| 6 |
|
|
|
|
| 2 |
|
| 3 |
# π¦ ZebraLogic: Benchmarking the Logical Reasoning Ability of Language Models
|
| 4 |
<!-- [π FnF Paper](https://arxiv.org/abs/2305.18654) | -->
|
| 5 |
+
[π° Blog](https://huggingface.co/blog/yuchenlin/zebra-logic) [π» GitHub](https://github.com/yuchenlin/ZeroEval) | [π€ HuggingFace](https://huggingface.co/collections/allenai/zebra-logic-bench-6697137cbaad0b91e635e7b0) | [π¦ X](https://twitter.com/billyuchenlin/) | [π¬ Discussion](https://huggingface.co/spaces/allenai/ZebraLogicBench-Leaderboard/discussions) | Updated: **{LAST_UPDATED}**
|
| 6 |
|