Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Posts
  • Docs
  • Enterprise
  • Pricing

  • Log In
  • Sign Up
Temus 's Collections
Foundation AI Papers
Foundation Models and Tools
LLM-evaluation
Foundation AI Papers (II)
Planning-with-LLM

LLM-evaluation

updated Sep 5, 2024

Evaluation of LLM agents paper

Upvote
1

  • MultiHop-RAG: Benchmarking Retrieval-Augmented Generation for Multi-Hop Queries

    Paper • 2401.15391 • Published Jan 27, 2024 • 6

  • Long-form factuality in large language models

    Paper • 2403.18802 • Published Mar 27, 2024 • 26

  • JudgeLM: Fine-tuned Large Language Models are Scalable Judges

    Paper • 2310.17631 • Published Oct 26, 2023 • 35

    Note Evaluator local LLM fine-tuned to mimic GPT-4 performance 1. Swap augmentation 2. Scenario additional augmentation (Rational Prompt)


  • Prometheus: Inducing Fine-grained Evaluation Capability in Language Models

    Paper • 2310.08491 • Published Oct 12, 2023 • 55

  • Prometheus 2: An Open Source Language Model Specialized in Evaluating Other Language Models

    Paper • 2405.01535 • Published May 2, 2024 • 123

  • On Speeding Up Language Model Evaluation

    Paper • 2407.06172 • Published Jul 8, 2024 • 1

  • Generative Verifiers: Reward Modeling as Next-Token Prediction

    Paper • 2408.15240 • Published Aug 27, 2024 • 13
Upvote
1
  • Collection guide
  • Browse collections
Company
TOS Privacy About Jobs
Website
Models Datasets Spaces Pricing Docs