Building on HF

5 33 12

ℏεsam PRO

hesamation

AI & ML interests

post-training / reasonign models / RAG

Recent Activity

upvoted a paper 6 days ago

Beyond Semantic Similarity: Rethinking Retrieval for Agentic Search via Direct Corpus Interaction

new activity 25 days ago

hesamation/Qwen3.6-35B-A3B-Claude-4.6-Opus-Reasoning-Distilled:question: which version of Opus-Reasoning-Distilled?

new activity 25 days ago

hesamation/Qwen3.6-35B-A3B-Claude-4.6-Opus-Reasoning-Distilled-GGUF:Model frequently enters repetitive output loops”

View all activity

Organizations

upvoted a paper 6 days ago

Beyond Semantic Similarity: Rethinking Retrieval for Agentic Search via Direct Corpus Interaction

Paper • 2605.05242 • Published 14 days ago • 106

upvoted a collection 28 days ago

Qwen3.6-35B-A3B-Claude-4.6-Opus-Reasoning-Distilled

Collection

2 items • Updated 28 days ago • 4

upvoted a paper about 2 months ago

AI Can Learn Scientific Taste

Paper • 2603.14473 • Published Mar 15 • 426

upvoted 3 papers 4 months ago

PaCoRe: Learning to Scale Test-Time Compute with Parallel Coordinated Reasoning

Paper • 2601.05593 • Published Jan 9 • 86

BabyVision: Visual Reasoning Beyond Language

Paper • 2601.06521 • Published Jan 10 • 201

Watching, Reasoning, and Searching: A Video Deep Research Benchmark on Open Web for Agentic Video Reasoning

Paper • 2601.06943 • Published Jan 11 • 214

upvoted an article 4 months ago

Article

Navigating the RLHF Landscape: From Policy Gradients to PPO, GAE, and DPO for LLM Alignment

NormalUhr

•

Feb 11, 2025

• 120

upvoted an article 5 months ago

Article

We Got Claude to Fine-Tune an Open Source LLM

burtenshaw, evalstate

•

Dec 4, 2025

• 624

upvoted a paper 5 months ago

DeepSeek-V3.2: Pushing the Frontier of Open Large Language Models

Paper • 2512.02556 • Published Dec 2, 2025 • 267

upvoted 2 papers 6 months ago

Stabilizing Reinforcement Learning with LLMs: Formulation and Practices

Paper • 2512.01374 • Published Dec 1, 2025 • 106

From Code Foundation Models to Agents and Applications: A Practical Guide to Code Intelligence

Paper • 2511.18538 • Published Nov 23, 2025 • 304

upvoted 2 papers 7 months ago

Scaling Latent Reasoning via Looped Language Models

Paper • 2510.25741 • Published Oct 29, 2025 • 229

Agent Learning via Early Experience

Paper • 2510.08558 • Published Oct 9, 2025 • 276

upvoted a paper 8 months ago

Loong: Synthesize Long Chain-of-Thoughts at Scale through Verifiers

Paper • 2509.03059 • Published Sep 3, 2025 • 25

upvoted a paper 9 months ago

AgentFly: Fine-tuning LLM Agents without Fine-tuning LLMs

Paper • 2508.16153 • Published Aug 22, 2025 • 162

upvoted a paper 10 months ago

A Survey of Context Engineering for Large Language Models

Paper • 2507.13334 • Published Jul 17, 2025 • 264

upvoted a paper 11 months ago

Revisiting Reinforcement Learning for LLM Reasoning from A Cross-Domain Perspective

Paper • 2506.14965 • Published Jun 17, 2025 • 50

upvoted 3 papers about 1 year ago

CipherBank: Exploring the Boundary of LLM Reasoning Capabilities through Cryptography Challenges

Paper • 2504.19093 • Published Apr 27, 2025 • 18

Skywork R1V2: Multimodal Hybrid Reinforcement Learning for Reasoning

Paper • 2504.16656 • Published Apr 23, 2025 • 58

Does Reinforcement Learning Really Incentivize Reasoning Capacity in LLMs Beyond the Base Model?

Paper • 2504.13837 • Published Apr 18, 2025 • 141