--- library_name: transformers license: apache-2.0 datasets: - jtatman/cosmopedia-wikihow-180k-sharegpt language: - en base_model: - Felladrin/Minueza-2-96M tags: - llama-factory --- # Minueza-2-96M-Instruct (Variant 09) This model is a fine-tuned version of [Felladrin/Minueza-2-96M](https://huggingface.co/Felladrin/Minueza-2-96M) on the English [jtatman/cosmopedia-wikihow-180k-sharegpt](https://huggingface.co/datasets/jtatman/cosmopedia-wikihow-180k-sharegpt) dataset. ## Usage ```sh pip install transformers==4.51.1 torch==2.6.0 ``` ```python from transformers import pipeline, TextStreamer import torch generate_text = pipeline( "text-generation", model="Felladrin/Minueza-2-96M-Instruct-Variant-09", device=torch.device("cuda" if torch.cuda.is_available() else "cpu"), ) messages = [ { "role": "system", "content": "You are a helpful and knowledgeable assistant that provides an expansive and comprehensive answer for a given query or instruction.", }, { "role": "user", "content": "Write a tutorial on how to publish a book.", }, ] generate_text( generate_text.tokenizer.apply_chat_template( messages, tokenize=False, add_generation_prompt=True ), streamer=TextStreamer(generate_text.tokenizer, skip_special_tokens=True), max_new_tokens=512, do_sample=True, temperature=0.7, top_p=0.9, top_k=0, min_p=0.1, repetition_penalty=1.17, ) ``` ## Training hyperparameters The following hyperparameters were used during training: - learning_rate: 5.8e-05 - train_batch_size: 4 - eval_batch_size: 4 - seed: 42 - gradient_accumulation_steps: 32 - total_train_batch_size: 128 - optimizer: Use adamw_torch with betas=(0.9,0.95) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments - lr_scheduler_type: cosine - lr_scheduler_warmup_ratio: 0.1 - num_epochs: 2 ## Framework versions - Transformers 4.51.1 - Pytorch 2.6.0+cu124 - Datasets 3.5.0 - Tokenizers 0.21.0 ## License This model is licensed under the Apache License 2.0.