toread - a eva0071 Collection

eva0071 's Collections

toread

updated about 13 hours ago

Why Fine-Tuning Encourages Hallucinations and How to Fix It

Paper • 2604.15574 • Published Apr 16 • 25
Tuna-2: Pixel Embeddings Beat Vision Encoders for Multimodal Understanding and Generation

Paper • 2604.24763 • Published Apr 27 • 71
Programming with Data: Test-Driven Data Engineering for Self-Improving LLMs from Raw Corpora

Paper • 2604.24819 • Published Apr 27 • 89
GLM-5V-Turbo: Toward a Native Foundation Model for Multimodal Agents

Paper • 2604.26752 • Published Apr 29 • 108
Large Language Models Explore by Latent Distilling

Paper • 2604.24927 • Published Apr 27 • 74
Accelerating RL Post-Training Rollouts via System-Integrated Speculative Decoding

Paper • 2604.26779 • Published Apr 29 • 13
Gated DeltaNet-2: Decoupling Erase and Write in Linear Attention

Paper • 2605.22791 • Published 16 days ago • 31
Unsupervised Process Reward Models

Paper • 2605.10158 • Published 26 days ago • 26
Full Attention Strikes Back: Transferring Full Attention into Sparse within Hundred Training Steps

Paper • 2605.16928 • Published 21 days ago • 93
From Seeing to Thinking: Decoupling Perception and Reasoning Improves Post-Training of Vision-Language Models

Paper • 2605.20177 • Published 18 days ago • 10
Rethinking Muon Beyond Pretraining: Spectral Failures and High-Pass Remedies for VLA and RLVR

Paper • 2605.19282 • Published 18 days ago • 8
Channel-wise Vector Quantization

Paper • 2605.26089 • Published 12 days ago • 15
Negligible in Size, Significant in Effect: On Scale Vectors in Large Language Models

Paper • 2605.26895 • Published 11 days ago • 20
Task-Focused Memorization for Multimodal Agents

Paper • 2605.31075 • Published 8 days ago • 33
LongTraceRL: Learning Long-Context Reasoning from Search Agent Trajectories with Rubric Rewards

Paper • 2605.31584 • Published 8 days ago • 41
Not All Disagreement Is Learnable: Token Teachability in On-Policy Distillation

Paper • 2605.26844 • Published 11 days ago • 25
ESPO: Early-Stopping Proximal Policy Optimization

Paper • 2605.29860 • Published 9 days ago • 18
NITP: Next Implicit Token Prediction for LLM Pre-training

Paper • 2605.24956 • Published 13 days ago • 33
Self-Distilled Policy Gradient

Paper • 2606.04036 • Published 4 days ago • 18
MemTrain: Self-Supervised Context Memory Training

Paper • 2606.03197 • Published 4 days ago • 16
Latent Reasoning with Normalizing Flows

Paper • 2606.06447 • Published 1 day ago • 2
ThoughtFold: Folding Reasoning Chains via Introspective Preference Learning

Paper • 2606.03503 • Published 3 days ago • 24
Unified Neural Scaling Laws

Paper • 2605.26248 • Published 12 days ago • 7