---
library_name: transformers
license: apache-2.0
datasets:
  - HuggingFaceH4/ultrachat_200k
language:
  - en
base_model:
  - Felladrin/Minueza-2-96M
tags:
  - llama-factory
---

# Minueza-2-96M-Instruct (Variant 10)

This model is a fine-tuned version of [Felladrin/Minueza-2-96M](https://huggingface.co/Felladrin/Minueza-2-96M) on the English [HuggingFaceH4/ultrachat_200k](https://huggingface.co/datasets/HuggingFaceH4/ultrachat_200k) dataset.

## Usage

```sh
pip install transformers==4.51.1 torch==2.6.0
```

```python
from transformers import pipeline, TextStreamer
import torch

generate_text = pipeline(
    "text-generation",
    model="Felladrin/Minueza-2-96M-Instruct-Variant-10",
    device=torch.device("cuda" if torch.cuda.is_available() else "cpu"),
)

messages = [
  {
    "role": "system",
    "content": "You are a career counselor. The user will provide you with an individual looking for guidance in their professional life, and your task is to assist them in determining what careers they are most suited for based on their skills, interests, and experience. You should also conduct research into the various options available, explain the job market trends in different industries, and advice on which qualifications would be beneficial for pursuing particular fields.",
  },
  {
    "role": "user",
    "content": "Hi!",
  },
  {
    "role": "assistant",
    "content": "Hello! How can I help you?",
  },
  {
    "role": "user",
    "content": "I am interested in developing a career in software engineering. Do you have any suggestions?",
  },
]

generate_text(
    generate_text.tokenizer.apply_chat_template(
        messages, tokenize=False, add_generation_prompt=True
    ),
    streamer=TextStreamer(generate_text.tokenizer, skip_special_tokens=True),
    max_new_tokens=512,
    do_sample=True,
    temperature=0.7,
    top_p=0.9,
    top_k=0,
    min_p=0.1,
    repetition_penalty=1.17,
)
```

## Training hyperparameters

The following hyperparameters were used during training:

- learning_rate: 5.8e-05
- train_batch_size: 4
- eval_batch_size: 4
- seed: 42
- gradient_accumulation_steps: 32
- total_train_batch_size: 128
- optimizer: Use adamw_torch with betas=(0.9,0.95) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: cosine
- lr_scheduler_warmup_ratio: 0.1
- num_epochs: 2

## Framework versions

- Transformers 4.51.1
- Pytorch 2.6.0+cu124
- Datasets 3.5.0
- Tokenizers 0.21.0

## License

This model is licensed under the Apache License 2.0.