Hugging Face
Models
Datasets
Spaces
Community
Docs
Enterprise
Pricing
Log In
Sign Up
deepakkarkala
/
llama31-8b-dpo-qlora-test
like
0
Text Generation
Transformers
Safetensors
HuggingFaceH4/ultrafeedback_binarized
llama
Generated from Trainer
alignment-handbook
trl
dpo
conversational
text-generation-inference
4-bit precision
bitsandbytes
arxiv:
2305.18290
Model card
Files
Files and versions
Community
Train
Deploy
Use this model
main
llama31-8b-dpo-qlora-test
Commit History
End of training
e8508f6
verified
deepakkarkala
commited on
Feb 19
Model save
9a248c3
verified
deepakkarkala
commited on
Feb 19
Training in progress, step 76
454764f
verified
deepakkarkala
commited on
Feb 19
initial commit
6d5044e
verified
deepakkarkala
commited on
Feb 19