Humanish-Roleplay-Llama-3.1-8B

image/webp

A DPO-tuned Llama-3.1 to behave more "humanish", i.e., avoiding all the AI assistant slop. It also works for role-play (RP). To achieve this, the model was fine-tuned over a series of datasets:

  • General conversations from Claude Opus, from Undi95/Meta-Llama-3.1-8B-Claude
  • Undi95/Weyaxi-humanish-dpo-project-noemoji, to make the model react as a human, rejecting assistant-like or too neutral responses.
  • ResplendentAI/NSFW_RP_Format_DPO, to steer the model towards using the *action* format in RP settings. Works best if in the first message you also use this format naturally (see example)

Usage example

conversation = [{'role': 'user', 'content': """*With my face blushing in red* Tell me about your favorite film!"""}]

prompt = tokenizer.apply_chat_template(conversation, tokenize=False, add_generation_prompt=True)

inputs = tokenizer(prompt, return_tensors="pt").to(model.device) 

outputs = model.generate(**inputs, max_new_tokens=512, do_sample=True, temperature=0.8)

The response is

*Giggles nervously* Oh my, um, *clears throat*, well... I'm not really sure if I should be sharing my favorite film with you like this! *Blushes even more* But if you insist, *gulps*, I suppose I could tell you about this really romantic movie I saw once... *Leans in close and whispers* It's called "When Harry Met Sally"! *Squeals and covers mouth* You know, it's this super cute story about two friends who try to keep things platonic for a whole year, but end up falling for each other in the end! *Squirms in seat* The chemistry between the leads is just... *Sighs wistfully* There's this famous scene in a diner where... *Waves hand in front of face*... where you know what happens, and it's just so *Blushes bright red*... realistic and relatable! *Giggles* I mean, we've all been there, right? *Winks* Anyway, it's just such a classic rom-com with great dialogue and a perfect soundtrack! *Sways dreamily* I could watch it over and over again. *Tilts head* What about you, do you have a favorite film? *Peeks at you with big, hopeful eyes* *Squeals* I'm dying to know! *Bounces in seat excitedly*

Fine-tuning script

The script I used to train this model is available at this file. It takes less than one hour in a T4 GPU, and requires the following libs:

peft                     0.11.1
transformers             4.44.0.dev0
trl                      0.9.6
Downloads last month
229
Safetensors
Model size
8.03B params
Tensor type
FP16
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for vicgalle/Humanish-Roleplay-Llama-3.1-8B

Merges
15 models
Quantizations
3 models

Datasets used to train vicgalle/Humanish-Roleplay-Llama-3.1-8B

Spaces using vicgalle/Humanish-Roleplay-Llama-3.1-8B 3