-
The Good, The Bad, and The Greedy: Evaluation of LLMs Should Not Ignore Non-Determinism
Paper • 2407.10457 • Published • 25 -
Adding Error Bars to Evals: A Statistical Approach to Language Model Evaluations
Paper • 2411.00640 • Published • 3 -
Law of the Weakest Link: Cross Capabilities of Large Language Models
Paper • 2409.19951 • Published • 55
Vignesh
Vigneshwaran
AI & ML interests
None yet
Recent Activity
updated
a collection
about 2 months ago
RLHF
updated
a collection
2 months ago
RLHF
updated
a collection
3 months ago
RLHF
Organizations
training
-
A Loss Curvature Perspective on Training Instability in Deep Learning
Paper • 2110.04369 • Published -
Why Do We Need Weight Decay in Modern Deep Learning?
Paper • 2310.04415 • Published -
Small-scale proxies for large-scale Transformer training instabilities
Paper • 2309.14322 • Published • 21 -
Transformers Can Navigate Mazes With Multi-Step Prediction
Paper • 2412.05117 • Published • 5
Synthetic data
RL
RLHF
-
ORPO: Monolithic Preference Optimization without Reference Model
Paper • 2403.07691 • Published • 67 -
sDPO: Don't Use Your Data All at Once
Paper • 2403.19270 • Published • 42 -
Teaching Large Language Models to Reason with Reinforcement Learning
Paper • 2403.04642 • Published • 51 -
Best Practices and Lessons Learned on Synthetic Data for Language Models
Paper • 2404.07503 • Published • 32
evaluation
-
The Good, The Bad, and The Greedy: Evaluation of LLMs Should Not Ignore Non-Determinism
Paper • 2407.10457 • Published • 25 -
Adding Error Bars to Evals: A Statistical Approach to Language Model Evaluations
Paper • 2411.00640 • Published • 3 -
Law of the Weakest Link: Cross Capabilities of Large Language Models
Paper • 2409.19951 • Published • 55
RL
training
-
A Loss Curvature Perspective on Training Instability in Deep Learning
Paper • 2110.04369 • Published -
Why Do We Need Weight Decay in Modern Deep Learning?
Paper • 2310.04415 • Published -
Small-scale proxies for large-scale Transformer training instabilities
Paper • 2309.14322 • Published • 21 -
Transformers Can Navigate Mazes With Multi-Step Prediction
Paper • 2412.05117 • Published • 5
RLHF
-
ORPO: Monolithic Preference Optimization without Reference Model
Paper • 2403.07691 • Published • 67 -
sDPO: Don't Use Your Data All at Once
Paper • 2403.19270 • Published • 42 -
Teaching Large Language Models to Reason with Reinforcement Learning
Paper • 2403.04642 • Published • 51 -
Best Practices and Lessons Learned on Synthetic Data for Language Models
Paper • 2404.07503 • Published • 32
Synthetic data