490 178 982

Peter Szemraj PRO

pszemraj

https://pszemraj.carrd.co/

pszemraj

AI & ML interests

metallic intuition

Recent Activity

upvoted a paper 1 day ago

One Token to Fool LLM-as-a-Judge

updated a collection 1 day ago

Survivor Library Books - OCR

updated a collection 1 day ago

Survivor Library Books - OCR

View all activity

Organizations

upvoted a paper 1 day ago

One Token to Fool LLM-as-a-Judge

Paper • 2507.08794 • Published 4 days ago • 19

upvoted a paper 2 days ago

Dynamic Chunking for End-to-End Hierarchical Sequence Modeling

Paper • 2507.07955 • Published 5 days ago • 15

upvoted a paper 5 days ago

A Survey on Latent Reasoning

Paper • 2507.06203 • Published 7 days ago • 73

upvoted an article 5 days ago

Article

SmolLM3: smol, multilingual, long-context reasoner

and 22 others •

7 days ago

• 515

upvoted a collection 6 days ago

T5Gemma

Collection

32 items • Updated 5 days ago • 50

upvoted an article 6 days ago

Article

Reachy Mini - The Open-Source Robot for Today's and Tomorrow's AI Builders

and 1 other •

6 days ago

• 543

upvoted 2 papers 6 days ago

Easy Dataset: A Unified and Extensible Framework for Synthesizing LLM Fine-Tuning Data from Unstructured Documents

Paper • 2507.04009 • Published 10 days ago • 28

Should We Still Pretrain Encoders with Masked Language Modeling?

Paper • 2507.00994 • Published 14 days ago • 73

upvoted a paper 8 days ago

Energy-Based Transformers are Scalable Learners and Thinkers

Paper • 2507.02092 • Published 13 days ago • 50

upvoted a paper 11 days ago

Self-Correction Bench: Revealing and Addressing the Self-Correction Blind Spot in LLMs

Paper • 2507.02778 • Published 12 days ago • 9

upvoted an article 13 days ago

Article

Training and Finetuning Sparse Embedding Models with Sentence Transformers v5

and 1 other •

14 days ago

• 89

upvoted 2 papers 17 days ago

Is There a Case for Conversation Optimized Tokenizers in Large Language Models?

Paper • 2506.18674 • Published 22 days ago • 8

Outlier-Safe Pre-Training for Robust 4-Bit Quantization of Large Language Models

Paper • 2506.19697 • Published 21 days ago • 44

upvoted a paper 18 days ago

Where to find Grokking in LLM Pretraining? Monitor Memorization-to-Generalization without Test

Paper • 2506.21551 • Published 19 days ago • 28

upvoted a collection 18 days ago

Gemma 3n

Collection

4 items • Updated 5 days ago • 175

upvoted a collection 22 days ago

Gemma 3 QAT

Collection

Quantization Aware Trained (QAT) Gemma 3 checkpoints. The model preserves similar quality as half precision while using 3x less memory • 15 items • Updated 5 days ago • 204

upvoted 2 papers 27 days ago

Scientists' First Exam: Probing Cognitive Abilities of MLLM via Perception, Understanding, and Reasoning

Paper • 2506.10521 • Published Jun 12 • 71

MiniMax-M1: Scaling Test-Time Compute Efficiently with Lightning Attention

Paper • 2506.13585 • Published 29 days ago • 253

upvoted 2 papers about 1 month ago

Domain2Vec: Vectorizing Datasets to Find the Optimal Data Mixture without Training

Paper • 2506.10952 • Published Jun 12 • 23

Through the Valley: Path to Effective Long CoT Training for Small Language Models

Paper • 2506.07712 • Published Jun 9 • 18

Peter Szemraj PRO

AI & ML interests

Recent Activity

Organizations

pszemraj's activity

SmolLM3: smol, multilingual, long-context reasoner

Reachy Mini - The Open-Source Robot for Today's and Tomorrow's AI Builders

Training and Finetuning Sparse Embedding Models with Sentence Transformers v5