AI & ML interests

Evaluating open LLMs

Recent Activity

open-llm-leaderboard's activity

AdinaYΒ 
posted an update 1 day ago
view post
Post
1036
QvQ-72B-PreviewπŸŽ„ an open weight model for visual reasoning just released by Alibaba_Qwen team
Qwen/qvq-676448c820912236342b9888
✨ Combines visual understanding & language reasoning.
✨ Scores 70.3 on MMMU
✨ Outperforms Qwen2-VL-72B-Instruct in complex problem-solving
freddyaboultonΒ 
posted an update 7 days ago
freddyaboultonΒ 
posted an update 8 days ago
lewtunΒ 
posted an update 9 days ago
view post
Post
6430
We outperform Llama 70B with Llama 3B on hard math by scaling test-time compute πŸ”₯

How? By combining step-wise reward models with tree search algorithms :)

We show that smol models can match or exceed the performance of their much larger siblings when given enough "time to think"

We're open sourcing the full recipe and sharing a detailed blog post.

In our blog post we cover:

πŸ“ˆ Compute-optimal scaling: How we implemented DeepMind's recipe to boost the mathematical capabilities of open models at test-time.

πŸŽ„ Diverse Verifier Tree Search (DVTS): An unpublished extension we developed to the verifier-guided tree search technique. This simple yet effective method improves diversity and delivers better performance, particularly at large test-time compute budgets.

🧭 Search and Learn: A lightweight toolkit for implementing search strategies with LLMs and built for speed with vLLM

Here's the links:

- Blog post: HuggingFaceH4/blogpost-scaling-test-time-compute

- Code: https://github.com/huggingface/search-and-learn

Enjoy!
  • 2 replies
Β·
AdinaYΒ 
posted an update 9 days ago
view post
Post
491
Megrez-3B-Omni πŸ”₯ an on-device multimodal LLM by Infinigence AI, another startup emerging from the Tsinghua University ecosystem.
Model: Infinigence/Megrez-3B-Omni
Demo: Infinigence/Megrez-3B-Omni
✨Supports analysis of image, text, and audio modalities
✨Leads in bilingual speech ( English & Chinese ) input, multi-turn conversations, and voice-based queries
✨Outperforms in scene understanding and OCR across major benchmarks
freddyaboultonΒ 
posted an update 13 days ago
view post
Post
1825
Version 0.0.21 of gradio-pdf now properly loads chinese characters!
lhoestqΒ 
posted an update 13 days ago
view post
Post
1606
Made a HF Dataset editor a la gg sheets here: lhoestq/dataset-spreadsheets

With Dataset Spreadsheets:
✏️ Edit datasets in the UI
πŸ”— Share link with collaborators
🐍 Use locally in DuckDB or Python

Available for the 100,000+ parquet datasets on HF :)
freddyaboultonΒ 
posted an update 13 days ago
view post
Post
1511
Hello Llama 3.2! πŸ—£οΈπŸ¦™

Build a Siri-like coding assistant that responds to "Hello Llama" in 100 lines of python! All with Gradio, webRTC 😎

freddyaboulton/hey-llama-code-editor
freddyaboultonΒ 
posted an update 15 days ago