Hugging Face Model Card (README.md
)
---
license: apache-2.0
tags:
- causal-lm
- text-generation
- chatbot
- qwen
- deepseek
- lora
- 4bit
- bitsandbytes
library_name: transformers
pipeline_tag: text-generation
quantized: true
base_model: Qwen/Qwen2.5-1.5B-Instruct
---
Qwen_1.5B_multilingual_Fine-Tuned_LLM โ LoRA 4-bit Fine-Tuned Model
This is a conversational language model based on [Qwen/Qwen2.5-1.5B-Instruct](: https://huggingface.co/Gensyn/Qwen2.5-1.5B-Instruct) fine-tuned with [LoRA adapters](https://github.com/huggingface/peft) for efficient training and inference. The model is loaded using **4-bit quantization (NF4)** through [BitsAndBytes](https://github.com/TimDettmers/bitsandbytes), enabling memory-efficient inference on consumer-grade GPUs.
---
## Model Details
- **Base model**: `: 'Qwen2.5-1.5B-Instruct`
- **Fine-tuning technique**: LoRA (Low-Rank Adaptation)
- **Quantization**: 4-bit NF4 via BitsAndBytes
- **Framework**: Hugging Face Transformers + PEFT
- **Pipeline**: `text-generation`
---
## Intended Use
This model is designed for **multi-turn chatbot applications**, creative writing, instruction following, and general-purpose text generation tasks within responsible use guidelines.
---
## Example Usage
```python
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
model_name = "lewishamilton21/Qwen_1.5B_multilingual_Fine-Tuned_LLM"
bnb_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_use_double_quant=True,
bnb_4bit_quant_type="nf4",
bnb_4bit_compute_dtype="float16"
)
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
model_name,
quantization_config=bnb_config,
device_map="auto"
)
inputs = tokenizer("Hello, how are you today?", return_tensors="pt").to("cuda")
outputs = model.generate(**inputs, max_new_tokens=100)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
Evaluation Metrics
Metric | Value (example) |
---|---|
Quantization Type | 4-bit NF4 |
LoRA Rank | 8 or 16 |
Max Length Tested | 2048 tokens |
VRAM (A100 40GB) | ~3.5 GB |
Custom benchmarks coming soon.
Training & Fine-Tuning
Fine-tuned via LoRA adapters using PEFT. To reproduce:
from peft import get_peft_model, LoraConfig, prepare_model_for_kbit_training
from transformers import TrainingArguments, Trainer
# Load model in 4bit
model = AutoModelForCausalLM.from_pretrained(
base_model_id,
quantization_config=bnb_config,
device_map="auto"
)
model = prepare_model_for_kbit_training(model)
model = get_peft_model(model, LoraConfig(...))
# Trainer setup
trainer = Trainer(
model=model,
args=TrainingArguments(...),
train_dataset=dataset
)
trainer.train()
License
Apache 2.0 โ free for research and commercial use within the license terms.
Acknowledgements
- DeepSeek AI
- Hugging Face Transformers
- BitsAndBytes by Tim Dettmers
- Hugging Face PEFT
- Downloads last month
- 0
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
๐
Ask for provider support