Hugging Face Model Card (README.md)

---
license: apache-2.0
tags:
- causal-lm
- text-generation
- chatbot
- qwen
- deepseek
- lora
- 4bit
- bitsandbytes
library_name: transformers
pipeline_tag: text-generation
quantized: true
base_model: Qwen/Qwen2.5-1.5B-Instruct
---

Qwen_1.5B_multilingual_Fine-Tuned_LLM โ€” LoRA 4-bit Fine-Tuned Model

This is a conversational language model based on [Qwen/Qwen2.5-1.5B-Instruct](: https://huggingface.co/Gensyn/Qwen2.5-1.5B-Instruct) fine-tuned with [LoRA adapters](https://github.com/huggingface/peft) for efficient training and inference. The model is loaded using **4-bit quantization (NF4)** through [BitsAndBytes](https://github.com/TimDettmers/bitsandbytes), enabling memory-efficient inference on consumer-grade GPUs.

---

##  Model Details

- **Base model**: `: 'Qwen2.5-1.5B-Instruct`
- **Fine-tuning technique**: LoRA (Low-Rank Adaptation)
- **Quantization**: 4-bit NF4 via BitsAndBytes
- **Framework**: Hugging Face Transformers + PEFT
- **Pipeline**: `text-generation`

---

##  Intended Use

This model is designed for **multi-turn chatbot applications**, creative writing, instruction following, and general-purpose text generation tasks within responsible use guidelines.

---

##  Example Usage

```python
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig

model_name = "lewishamilton21/Qwen_1.5B_multilingual_Fine-Tuned_LLM"

bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_use_double_quant=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype="float16"
)

tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    quantization_config=bnb_config,
    device_map="auto"
)

inputs = tokenizer("Hello, how are you today?", return_tensors="pt").to("cuda")
outputs = model.generate(**inputs, max_new_tokens=100)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Evaluation Metrics

Metric Value (example)
Quantization Type 4-bit NF4
LoRA Rank 8 or 16
Max Length Tested 2048 tokens
VRAM (A100 40GB) ~3.5 GB

Custom benchmarks coming soon.


Training & Fine-Tuning

Fine-tuned via LoRA adapters using PEFT. To reproduce:

from peft import get_peft_model, LoraConfig, prepare_model_for_kbit_training
from transformers import TrainingArguments, Trainer

# Load model in 4bit
model = AutoModelForCausalLM.from_pretrained(
    base_model_id,
    quantization_config=bnb_config,
    device_map="auto"
)

model = prepare_model_for_kbit_training(model)
model = get_peft_model(model, LoraConfig(...))

# Trainer setup
trainer = Trainer(
    model=model,
    args=TrainingArguments(...),
    train_dataset=dataset
)
trainer.train()

License

Apache 2.0 โ€” free for research and commercial use within the license terms.


Acknowledgements

  • DeepSeek AI
  • Hugging Face Transformers
  • BitsAndBytes by Tim Dettmers
  • Hugging Face PEFT


Downloads last month
0
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for lewishamilton21/Qwen_1.5B_multilingual_Fine-Tuned_LLM

Base model

Qwen/Qwen2.5-1.5B
Adapter
(435)
this model