2 13 22

AlphaSue

AI & ML interests

None yet

Recent Activity

upvoted a collection about 2 months ago

Whisper

upvoted an article 2 months ago

Vision Language Models (Better, Faster, Stronger)

upvoted a collection 3 months ago

ProX Refining Models

View all activity

Organizations

None yet

upvoted a collection about 2 months ago

Whisper

Collection

OpenAI Whisper speech recognition models in MLX format • 48 items • Updated Oct 1, 2024 • 51

upvoted an article 2 months ago

Article

Vision Language Models (Better, Faster, Stronger)

and 4 others •

May 12

• 488

upvoted a collection 3 months ago

ProX Refining Models

Collection

Adapted small language models used to generate data refining programs • 5 items • Updated Oct 10, 2024 • 4

New activity in gair-prox/web-chunk-refining-lm 3 months ago

what is the chat template?

#1 opened 3 months ago by

AlphaSue

upvoted 3 papers 3 months ago

How Instruction and Reasoning Data shape Post-Training: Data Quality through the Lens of Layer-wise Gradients

Paper • 2504.10766 • Published Apr 14 • 40

xVerify: Efficient Answer Verifier for Reasoning Model Evaluations

Paper • 2504.10481 • Published Apr 14 • 84

Understanding R1-Zero-Like Training: A Critical Perspective

Paper • 2503.20783 • Published Mar 26 • 55

liked 3 models 4 months ago

upvoted an article 4 months ago

Article

Open R1: Update #3

and 9 others •

Mar 11

• 295

upvoted 2 papers 4 months ago

Modifying Large Language Model Post-Training for Diverse Creative Writing

Paper • 2503.17126 • Published Mar 21 • 37

Organize the Web: Constructing Domains Enhances Pre-Training Data Curation

Paper • 2502.10341 • Published Feb 14 • 3

liked a Space 5 months ago

116

TxT360: Trillion Extracted Text

📖

Create a large-scale deduplicated text dataset for LLM training

liked a model 5 months ago

jinaai/ReaderLM-v2

Text Generation • 2B • Updated Mar 4 • 13.6k • • 677

liked a Space 5 months ago

2.84k

The Ultra-Scale Playbook

🌌

The ultimate guide to training LLM on large GPU Clusters

upvoted an article 6 months ago

Article

Mixture of Experts Explained

and 5 others •

Dec 11, 2023

• 768

upvoted a collection 7 months ago

Papers I've read

Collection

16 items • Updated Jan 12 • 6

liked a dataset 7 months ago

microsoft/RedStone

Updated Dec 5, 2024 • 13 • 34

liked a model 7 months ago

open-web-math/filtering-models

Updated Nov 2, 2023 • 9

AlphaSue

AI & ML interests

Recent Activity

Organizations

AlphaSue's activity

Vision Language Models (Better, Faster, Stronger)

what is the chat template?

Open R1: Update #3

TxT360: Trillion Extracted Text

The Ultra-Scale Playbook

Mixture of Experts Explained