Model Card for Llama3.2-1B-Open-R1-Distill

This model is a fine-tuned version of meta-llama/Llama-3.2-1B-Instruct on the HuggingFaceH4/Bespoke-Stratos-17k dataset. It has been trained using TRL.

Quick start

from transformers import AutoTokenizer, AutoModelForCausalLM
model_id = "tyfeng1997/Llama3.2-1B-Open-R1-Distill"
model = AutoModelForCausalLM.from_pretrained(model_id, device_map="cuda")
tokenizer = AutoTokenizer.from_pretrained(model_id)

# Prepare the messages
messages = [
    {
        "role": "system",
        "content": """Your role as an assistant involves thoroughly exploring questions through a systematic long thinking process before providing the final precise and accurate solutions. This requires engaging in a comprehensive cycle of analysis, summarizing, exploration, reassessment, reflection, backtracing, and iteration to develop well-considered thinking process. Please structure your response into two main sections: Thought and Solution. In the Thought section, detail your reasoning process using the specified format: <|begin_of_thought|> {thought with steps separated with '\n\n'} <|end_of_thought|> Each step should include detailed considerations such as analisying questions, summarizing relevant findings, brainstorming new ideas, verifying the accuracy of the current steps, refining any errors, and revisiting previous steps. In the Solution section, based on various attempts, explorations, and reflections from the Thought section, systematically present the final solution that you deem correct. The solution should remain a logical, accurate, concise expression style and detail necessary step needed to reach the conclusion, formatted as follows: <|begin_of_solution|> {final formatted, precise, and clear solution} <|end_of_solution|> Now, try to solve the following question through the above guidelines:"""
    },
    {
        "role": "user",
        "content": """A regular hexagon can be divided into six equilateral triangles. If the perimeter of one of the triangles is 21 inches, what is the perimeter, in inches, of the regular hexagon?"""
    }
]

# Apply chat template
prompt = tokenizer.apply_chat_template(messages, tokenize=False)

# Tokenize the prompt
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)

# Generate
outputs = model.generate(
    inputs.input_ids,
    max_new_tokens=10000,
    pad_token_id=tokenizer.pad_token_id,
    eos_token_id=tokenizer.eos_token_id
)

# Decode and print the response
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)

<|begin_of_thought|>

Okay, so I need to figure out the perimeter of a regular hexagon that can be divided into six equilateral triangles. The perimeter of one of these triangles is given as 21 inches. Hmm, let me think.

First, I remember that a regular hexagon can be divided into six equilateral triangles. That's a good starting point. If I can find the perimeter of one of these triangles, I can multiply that by six to get the perimeter of the hexagon. But how do I find the perimeter of an equilateral triangle?

I know that the perimeter of an equilateral triangle is three times the length of one of its sides. Since the perimeter of the triangle is 21 inches, that means each side is 21 divided by three, which is 7 inches. So, each side of the equilateral triangle is 7 inches.

Now, since the hexagon is made up of six such equilateral triangles, if each triangle has a side of 7 inches, then the perimeter of the hexagon would be six times that. So, 6 * 7 = 42 inches. That seems straightforward.

But let me double-check my reasoning to make sure I didn't make any mistakes. If I divide the hexagon into six equilateral triangles, each triangle has a side of 7 inches. The perimeter of the triangle is 21 inches. To find the side length, I divided 21 by 3, which gives 7. Then, the perimeter of the hexagon is 6 times that, which is 42. Yep, that seems correct.

Wait, but just to make sure I didn't miss anything. Let me visualize the hexagon and the six equilateral triangles. If each triangle has a side of 7 inches, then the perimeter of the hexagon would be the sum of the perimeters of all six triangles. Each triangle is an equilateral triangle, so each has three sides. So, the perimeter of one triangle is 21 inches, so the perimeter of one side of the triangle is 21 / 3 = 7 inches. Then, the perimeter of the hexagon is 6 times that, which is 6 * 7 = 42 inches. That seems correct.

I think that's solid. No mistakes here. So, the perimeter of the regular hexagon is 42 inches. That makes sense because if you divide the hexagon into six equilateral triangles, each triangle's perimeter is 21 inches, so the total perimeter of the hexagon is 6 times that, which is 42 inches.

<|end_of_thought|>

<|begin_of_solution|>

The perimeter of a regular hexagon that can be divided into six equilateral triangles is calculated by multiplying the perimeter of one equilateral triangle by six. 

1. **Perimeter of an equilateral triangle**: The perimeter of an equilateral triangle is three times the length of one side. Given the perimeter of the triangle is 21 inches, each side is \(\frac{21}{3} = 7\) inches.
2. **Perimeter of the hexagon**: Since the hexagon is made up of six equilateral triangles, the perimeter of the hexagon is \(6 \times 7 = 42\) inches.

Thus, the perimeter of the regular hexagon is \(\boxed{42}\) inches.

<|end_of_solution|>

Training procedure

This model was trained with SFT.

Framework versions

TRL: 0.15.0.dev0
Transformers: 4.49.0.dev0
Pytorch: 2.5.1
Datasets: 3.2.0
Tokenizers: 0.21.0

Note

This model was only trained for 1 epoch, so the MATH results are not good.

Citations

Cite TRL as:

@misc{vonwerra2022trl,
    title        = {{TRL: Transformer Reinforcement Learning}},
    author       = {Leandro von Werra and Younes Belkada and Lewis Tunstall and Edward Beeching and Tristan Thrush and Nathan Lambert and Shengyi Huang and Kashif Rasul and Quentin Gallouédec},
    year         = 2020,
    journal      = {GitHub repository},
    publisher    = {GitHub},
    howpublished = {\url{https://github.com/huggingface/trl}}
}

tyfeng1997
/

Llama3.2-1B-Open-R1-Distill