Edit Models filters

Apps

Docker Model Runner

Inference Providers

HF Inference API

Misc

Inference Endpoints

text-generation-inference

4-bit precision

8-bit precision

text-embeddings-inference

Carbon Emissions

Mixture of Experts

Models

9,698

Full-text search

Active filters: dpo

NicholasCorrado/zephyr-7b-uf-rlced-conifer-group-dpo-2e

Text Generation • 7B • Updated Sep 7, 2024 • 2

KoNqUeRoR3891/HW2-dpo

Text Generation • 0.1B • Updated Sep 6, 2024 • 4

nomadrp/tq-aya101-gt2

Updated Sep 6, 2024 • 2

nomadrp/tq-llama3.1-gt3

Updated Sep 6, 2024 • 163

NicholasCorrado/zephyr-7b-uf-rlced-conifer-1e2e-group-dpo-2e

Text Generation • 7B • Updated Sep 7, 2024 • 2

nomadrp/tq-llama3.1-sent-shlfd-gt3

Updated Sep 7, 2024 • 66

QuantFactory/Lama-DPOlphin-8B-GGUF

Text Generation • 8B • Updated Sep 8, 2024 • 77 • 2

LBK95/Llama-2-7b-hf-DPO-LookAhead5_FullEval_TTree1.4_TLoop0.7_TEval0.2_V1.0

Updated Sep 9, 2024 • 2

Wenboz/zephyr-7b-wpo-lora

Updated Sep 18, 2024 • 2

YYYYYYibo/gshf_ours_1_iter_2

7B • Updated Sep 9, 2024 • 2

Magpie-Align/MagpieLM-4B-Chat-v0.1

Text Generation • 5B • Updated Dec 9, 2024 • 53 • 20

Triangle104/NeuralDaredevil-8B-abliterated-Q4_K_M-GGUF

8B • Updated Sep 9, 2024 • 3

Triangle104/NeuralDaredevil-8B-abliterated-Q4_0-GGUF

8B • Updated Sep 9, 2024 • 5

Triangle104/NeuralDaredevil-8B-abliterated-Q4_K_S-GGUF

8B • Updated Sep 9, 2024 • 9

YYYYYYibo/gshf_ours_1_iter_3

7B • Updated Sep 9, 2024 • 2

lewtun/dpo-model-lora

Updated Sep 9, 2024 • 2

CharlesLi/OpenELM-1_1B-DPO-full-max-min-reward

Text Generation • 1B • Updated Sep 10, 2024 • 2

CharlesLi/OpenELM-1_1B-DPO-full-max-random-reward

Text Generation • 1B • Updated Sep 9, 2024 • 2

CharlesLi/OpenELM-1_1B-DPO-full-least-similar

Text Generation • 1B • Updated Oct 3, 2024 • 2

taicheng/zephyr-7b-dpo-qlora

Updated Sep 13, 2024 • 2

CharlesLi/OpenELM-1_1B-DPO-full-max-reward-least-similar

Text Generation • 1B • Updated Oct 3, 2024 • 7

dmariko/SmolLM-360M-Instruct-dpo-15k

0.4B • Updated Sep 12, 2024 • 2

QinLiuNLP/llama3-sudo-dpo-instruct-5epochs-0909

Updated Sep 10, 2024 • 2

CharlesLi/OpenELM-1_1B-DPO-full-max-reward-most-similar

Text Generation • 1B • Updated Oct 3, 2024 • 2

CharlesLi/OpenELM-1_1B-DPO-full-most-similar

Text Generation • 1B • Updated Oct 3, 2024 • 2

DUAL-GPO/phi-2-dpo-chatml-lora-i1

Updated Sep 10, 2024 • 2

CharlesLi/OpenELM-1_1B-DPO-full-max-second-reward

Text Generation • 1B • Updated Sep 23, 2024 • 3

CharlesLi/OpenELM-1_1B-DPO-full-random-pair

Text Generation • 1B • Updated Sep 10, 2024 • 2

Wenboz/zephyr-7b-dpo-lora

Updated Oct 20, 2024 • 2

DUAL-GPO/phi-2-dpo-chatml-lora-10k-30k-i1

Updated Sep 10, 2024 • 4