Commonsense (Micro) Commonsense (Macro) Hard (Micro) Hard (Macro) Final Pass Rate
Direct Prompting
Llama3.1-8B 60.1 0.0 7.9 2.8 0.0
Qwen2-7B 49.9 1.1 2.1 0.0 0.0
Fine-tuning
Llama3.1-8B 78.3 17.8 19.3 6.1 3.8
Qwen2-7B 59.0 0.6 0.2 0.0 0.0

If our related resources prove valuable to your research, we kindly ask for a citation.

@article{xie2024revealing,
  title={Revealing the Barriers of Language Agents in Planning},
  author={Xie, Jian and Zhang, Kexun and Chen, Jiangjie and Yuan, Siyu and Zhang, Kai and Zhang, Yikai and Li, Lei and Xiao, Yanghua},
  journal={arXiv preprint arXiv:2410.12409},
  year={2024}
}
Downloads last month
27
Safetensors
Model size
7.62B params
Tensor type
F32
·
Inference Providers NEW
This model is not currently available via any of the supported third-party Inference Providers, and the model is not deployed on the HF Inference API.