Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Docs
  • Enterprise
  • Pricing

  • Log In
  • Sign Up
Yoai 's Collections
Agents
Evo Algo
Ai-models
Diffusion
Ai-hacking
Agent-Cognition
Finetune
Eval Agents
Voice-models
Medical
Prompting

Eval Agents

updated Aug 8, 2024
Upvote
-

  • NATURAL PLAN: Benchmarking LLMs on Natural Language Planning

    Paper • 2406.04520 • Published Jun 6, 2024 • 14

  • GenAI Arena: An Open Evaluation Platform for Generative Models

    Paper • 2406.04485 • Published Jun 6, 2024 • 23

  • CoverBench: A Challenging Benchmark for Complex Claim Verification

    Paper • 2408.03325 • Published Aug 6, 2024 • 15
Upvote
-
  • Collection guide
  • Browse collections
Company
TOS Privacy About Jobs
Website
Models Datasets Spaces Pricing Docs