@rajistics on Hugging Face: "Having some fun with long context benchmarks (watch the video!!) NoLiMA:…"

Join the conversation

Join the community of Machine Learners and AI enthusiasts.

rajistics

posted an update Apr 13

Post

3530

Having some fun with long context benchmarks (watch the video!!)

NoLiMA: NoLiMa: Long-Context Evaluation Beyond Literal Matching (2502.05167)
Fiction LiveBench: https://fiction.live/stories/Fiction-liveBench-Mar-25-2025/oQdzQvKHw8JyXbN87
Michalenglo: https://deepmind.google/research/publications/117639/
LongGenBench: Spinning the Golden Thread: Benchmarking Long-Form Generation in Language Models (2409.02076)
NeedleBench: NeedleBench: Can LLMs Do Retrieval and Reasoning in 1 Million Context Window? (2407.11963)
RULER: RULER: What's the Real Context Size of Your Long-Context Language Models? (2404.06654)

For more: https://www.reddit.com/r/rajistics/comments/1jxwk29/long_context_llm_benchmarks_video/

let me know if you like these posts

In this post

rajistics Rajiv Shah