---
library_name: transformers
tags:
- unsloth
- trl
- sft
---

# Model Card: Custom LLaMA-3 Model with 4-bit Quantization

## Model Details
- **Architecture:** LoRA (Low-Rank Adaptation)
- **Quantization:** 4-bit

## Model Description
This is a custom version of the LLaMA-3 language model trained with 4-bit quantization. The model uses LoRA (Low-Rank Adaptation) for efficient fine-tuning, allowing for reduced memory usage and faster training times without significant loss in performance.

## Training Configuration
The model was trained using the following configuration:

- **Learning Rate:** 2e-4
- **Optimizer:** AdamW (8-bit)
- **Weight Decay:** 0.01
- **LR Scheduler:** Linear
- **Mixed Precision:** FP16/BF16 (depending on hardware support)

## LoRA Configuration
The model uses LoRA for efficient parameter adaptation with the following settings:

- **Rank (r):** 16
- **Target Modules:** `["q_proj", "k_proj", "v_proj", "o_proj", "gate_proj", "up_proj", "down_proj"]`
- **LoRA Alpha:** 16

## Training Dataset
- **Dataset:** Custom dataset containing Turkish text data
- **Max Sequence Length:** 1024

## Usage
To use this model, you can load it using the Hugging Face `transformers` library:

```python
from transformers import AutoModelForCausalLM, AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("erythropygia/LLAMA3-8B-Turkish-4bit-Quantized")
model = AutoModelForCausalLM.from_pretrained("erythropygia/LLAMA3-8B-Turkish-4bit-Quantized", low_cpu_mem_usage=True,  load_in_4bit=True)

prompt_format = """Aşağıda bir görevi tanımlayan bir talimat ve daha fazla bağlam sağlayan bir girdi bulunmaktadır. Talebi uygun şekilde tamamlayan bir yanıt yazın.

### Instruction:
{}

### Input:
{}

### Response:
{}"""

inputs = tokenizer(
[
    prompt_format.format(
        """, # instruction
        "", # input
        "", # output - leave this blank for generation!
    )
], return_tensors = "pt").to("cuda")

from transformers import TextStreamer
text_streamer = TextStreamer(tokenizer)
_ = model.generate(**inputs, streamer = text_streamer, max_new_tokens = 512, do_sample=True, temperature=0.75, top_k=50, top_p=0.9, repetition_penalty=1.1)
```

## Performance
- **Training Loss::** 1.385300
- **Evaluation Metrics:** To be updated based on evaluation results
- **Limitations and Biases:** This model inherits biases present in the training data. It is important to evaluate the model thoroughly for your specific use case and consider any ethical implications of its deployment.