Play2Prompt (P2P) StableToolBench Evaluation Pipeline

Replicates the Play2Prompt paper conditions on StableToolBench using Llama-3.1-8B-Instruct.

Designed for extensibility: The 4 conditions are controlled by two pluggable components — tool descriptions and in-context examples. To test your own description types, just drop replacement files into p2p_data/descriptions/ and p2p_data/examples/.

See pipeline/ directory for all source code.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Paper for Dwootton/p2p-stabletoolbench