Model Card for Llama-3.1-Argunaut-1-8B-HIRPO

This model is a fine-tuned version of DebateLabKIT/Llama-3.1-Argunaut-1-8B-SPIN. It has been trained using TRL and vLLM. Checkpoints are tagged.

📘 HF Blog Article

Visualize in Weights & Biases

Quick start

from transformers import pipeline

question = "If you had a time machine, but could only go to the past or the future once and never return, which would you choose and why?"
generator = pipeline("text-generation", model="DebateLabKIT/Llama-3.1-Argunaut-1-8B-HIRPO", device="cuda")
output = generator([{"role": "user", "content": question}], max_new_tokens=128, return_full_text=False)[0]
print(output["generated_text"])

Training procedure

This model was trained with Hindsight Instruction Relabeling Preference Optimization (HIRPO), an Online DPO version derived from The Wisdom of Hindsight Makes Language Models Better Instruction Followers.

More details about the training procedure can be found in the blog post.

Framework versions

  • TRL: 0.14.0
  • Transformers: 4.46.3
  • Pytorch: 2.4.1
  • Datasets: 3.1.0
  • Tokenizers: 0.20.3

Evaluation

Chat Experience

👤 You
Hi, are you familiar with Argdown?

🤖 Assistant
Yes, I'm familiar with Argdown, a markup language for logical and critical thinking, particularly suited for representing and analyzing arguments. It allows for the clear and structured representation of arguments, including claims and arguments, as well as inference trees. If you have an argument you'd like me to analyze or need assistance with a critical thinking problem, feel free to provide the details!

👤 You
Can you please Argdown argument map that visualizes the argumentative macro structure of the following text?

:::
School uniforms deter crime and increase student safety. In Long Beach, California, after two years of a district-wide K-8 mandatory uniform policy, reports of assault and battery in the district’s schools decreased by 34%, assault with a deadly weapon dropped by 50%, fighting incidents went down by 51%, sex offenses were cut by 74%, robbery dropped by 65%, possession of weapons (or weapon “look-alikes”) decreased by 52%, possession of drugs went down by 69%, and vandalism was lowered by 18%. [64]

One year after Sparks Middle School in Nevada instituted a uniform policy, school police data showed a 63% drop in police log reports, and decreases were also noted in gang activity, student fights, graffiti, property damage, and battery. A peer-reviewed study found that schools with uniform policies had 12% fewer firearm-related incidents and 15% fewer drug-related incidents than schools without uniforms. [25][69]

School uniforms also prevent students from concealing weapons under baggy clothing, make it easier to keep track of students on field trips, and make intruders on campus more visible. Frank Quatrone, superintendent in the Lodi school district of New Jersey, states, “When you have students dressed alike, you make them safer. If someone were to come into a building, the intruder could easily be recognized.” [6][38]

Further, school uniforms create a level playing field among students, reducing peer pressure and bullying. When all students are dressed alike, competition between students over clothing choices and the teasing of those who are dressed in less expensive or less fashionable outfits can be eliminated. Research by the Schoolwear Association found that 83% of teachers thought “a good school uniform…could prevent bullying based on appearance or economic background.” Arminta Jacobson, founder and director of the Center for Parent Education at the University of North Texas, states that uniforms put “all kids on the same playing field in terms of their appearance. I think it probably gives them a sense of belonging and a feeling of being socially accepted.” [5][91]
:::

🤖 Assistant
The provided text does not contain an argument with clearly defined premises and a conclusion. It presents some claims and supports them with evidence, but it does not explicitly state an argument with a clear structure of premises and a conclusion. Therefore, there is no argument to map.

If you provide the full argument, including the premises and the conclusion, I can assist you in creating an Argdown argument map. Please share the complete argument.

👤 You
That is correct. But I'm asking you to map these support and attack relations using Argdown Syntax in an argdown snippet.

🤖 Assistant
Given the information provided, it appears there's an implicit argument structure. However, without explicit premises and a conclusion, it's challenging to create a clear Argdown map. Nonetheless, let's represent the support and attack relations as described:

[Conclusion~1]: School uniforms deter crime and increase student safety.
    <+ <Subargument 1>: 
        <+ [Conclusion~2]: Implementing school uniforms reduces bullying based on appearance or economic background.
            <+ <Subargument 2>: 
                <+ [Premise]: 83% of teachers think that a good school uniform can prevent bullying based on appearance or economic background.
        <+ [Premise]: Reducing bullying based on appearance or economic background contributes to a sense of belonging and social acceptance among students.
        <+ [Premise]: A sense of belonging and social acceptance among students is a desirable outcome.

👤 You
Please summarize, in your own words, the main argument presented in the text. No more than 2-3 sentemces, please.

🤖 Assistant
The argument's key point is: School uniforms deter crime and increase student safety.

👤 You
Let's focus on this argument. Just for the protocol, what's the conclusion?

🤖 Assistant
The text doesn't explicitly state a conclusion. It provides supporting evidence and quotes from various sources. If I had to infer a conclusion based on the text, it would be:

School uniforms deter crime and increase student safety.

👤 You
Now, can you please reconstruct this argument as premise conclusion structure (standard form), using argdown syntax? Please give the argument a succinct title before starting with the premise conclusion structure.

🤖 Assistant
The argument can be reconstructed as follows:

  • The argument does not explicitly state that the conclusion follows from the premises, so I will not add an inference figure.
(1) 83% of teachers think that a good school uniform can prevent bullying based on appearance or economic background.
(2) Reducing bullying based on appearance or economic background contributes to a sense of belonging and social acceptance among students.
(3) A sense of belonging and social acceptance among students is a desirable outcome.
--
School uniforms deter crime and increase student safety.

👤 You
Hm, please explain: How do the three premises collectively justify the conclusion?

🤖 Assistant
The premises collectively justify the conclusion by establishing a connection between school uniforms, bullying, and student well-being. Specifically, the first premise indicates that a significant majority of teachers support the idea that school uniforms can reduce bullying, thereby creating a safer and more harmonious school environment. The second premise suggests that reducing bullying leads to a positive social dynamic among students, where they feel a sense of belonging and acceptance. The third premise reinforces the value of this social dynamic by affirming that it is a desirable outcome. Putting these premises together, one can infer that school uniforms, which can reduce bullying, contribute to creating a more positive and safer school environment, thereby supporting the conclusion that school uniforms deter crime and increase student safety.

👤 You
Can you add a connecting premise that makes the link between a positive environment on the one side and safety / crime deterrence on the other explicit?

🤖 Assistant
Let me adjust the argdown code accordingly.

(1) 83% of teachers think that a good school uniform can prevent bullying based on appearance or economic background.
(2) Reducing bullying based on appearance or economic background contributes to a sense of belonging and social acceptance among students.
(3) A sense of belonging and social acceptance among students is a desirable outcome.
(4) A positive school environment contributes to a safer school environment.
--------------------------
(5) School uniforms deter crime and increase student safety.

Metrics

coming soon

Citations

Cite HIR as:

@misc{zhang2023wisdomhindsightmakeslanguage,
      title={The Wisdom of Hindsight Makes Language Models Better Instruction Followers}, 
      author={Tianjun Zhang and Fangchen Liu and Justin Wong and Pieter Abbeel and Joseph E. Gonzalez},
      year={2023},
      eprint={2302.05206},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2302.05206}, 
}```

Cite TRL as:
    
```bibtex
@misc{vonwerra2022trl,
    title        = {{TRL: Transformer Reinforcement Learning}},
    author       = {Leandro von Werra and Younes Belkada and Lewis Tunstall and Edward Beeching and Tristan Thrush and Nathan Lambert and Shengyi Huang and Kashif Rasul and Quentin Gallouédec},
    year         = 2020,
    journal      = {GitHub repository},
    publisher    = {GitHub},
    howpublished = {\url{https://github.com/huggingface/trl}}
}
Downloads last month
1
Safetensors
Model size
8.03B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for DebateLabKIT/Llama-3.1-Argunaut-1-8B-HIRPO

Dataset used to train DebateLabKIT/Llama-3.1-Argunaut-1-8B-HIRPO