Play2Prompt (P2P) StableToolBench Evaluation Pipeline

Replicates the Play2Prompt paper conditions on StableToolBench using Llama-3.1-8B-Instruct.

Designed for extensibility: The 4 conditions are controlled by two pluggable components — tool descriptions and in-context examples. To test your own description types, just drop replacement files into p2p_data/descriptions/ and p2p_data/examples/.

See pipeline/ directory for all source code.

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Paper for Dwootton/p2p-stabletoolbench

StableToolBench: Towards Stable Large-Scale Benchmarking on Tool Learning of Large Language Models

Paper • 2403.07714 • Published Mar 12, 2024 • 1