view article Article StackLLaMA: A hands-on guide to train LLaMA with RLHF By edbeeching and 6 others • Apr 5, 2023 • 42
Reward Models Collection Nemotron reward models. For use in RLHF pipelines and LLM-as-a-Judge • 8 items • Updated 21 days ago • 19
Models I WIll GGUF Collection MODELS MUST BE <=22B. To add to this open this link: https://huggingface.co/collections/ReallyFloppyPenguin/models2gguflater-68503439edc1aa25cce7c79b • 0 items • Updated Jun 23 • 1
On Path to Multimodal Generalist: General-Level and General-Bench Paper • 2505.04620 • Published May 7 • 83
Absolute Zero: Reinforced Self-play Reasoning with Zero Data Paper • 2505.03335 • Published May 6 • 182
ZeroGPU Spaces Collection ZeroGPU Spaces made by the community • 17 items • Updated Jun 6, 2024 • 244