π§ Llama-3.1-8B-Hinglish-General-sft
Llama-3.1-8b-Hinglish-General-sft is a lightweight, domain-specific fine-tuned model built for conversational Hinglish-style reasoning with a focus on general and basic Hinglish knowledge. It builds upon Meta-Llama-3.1-8B
and uses LoRA adapters for efficient fine-tuning with Unsloth.
β οΈ This model is a demonstration of supervised fine-tuning and is intended solely for educational and informational purposes. It is not validated for critical applications and should not be used for real-life decision-making.
π Model Summary
- Base Model:
unsloth/Meta-Llama-3.1-8B
- LoRA Adapter:
Subh775/Llama-3.1-8b-Hinglish-General-sft
- Fine-tuned Dataset:
fhai50032/Hinglish-CoT-General
- Language: Hinglish (Hindi-English mix)
- Training Time: 49.24 minutes (1 epoch)
- Framework: Unsloth
- Quantization: 4-bit (for efficient inference)
π‘ Key Features
- π£οΈ Hinglish-CoT Reasoning: Trained on ~2K question-answer pairs with step-by-step reasoning in Hinglish.
- βοΈ Efficient Inference: Enabled by LoRA + Unsloth + 4-bit quantization.
- π Fast and Lightweight: Optimized for quick inference even on limited hardware.
π οΈ Inference Instructions
π§ Installation
pip install unsloth
from unsloth import FastLanguageModel
import torch
alpaca_prompt = """Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.
### Instruction:
{question}
### Input:
{thoughts}
### Response:
{answer}"""
# Load model
model, tokenizer = FastLanguageModel.from_pretrained(
model_name="Subh775/Llama-3.1-8b-Hinglish-General-sft",
max_seq_length=2048,
load_in_4bit=True
)
FastLanguageModel.for_inference(model)
import re
def clean_response(text):
if "### Response:" in text:
text = text.split("### Response:")[-1]
lines = text.strip().splitlines()
clean_lines = [line.strip() for line in lines if not re.match(r"^(#|input:|response:|Input:|Response:)", line, re.IGNORECASE)]
return " ".join(clean_lines).strip()
def chat():
print("π©Ί Chat with Llama-3.1-8b-Hinglish-General-sft! Type '\\q' or 'quit' to stop.\n")
chat_history = ""
while True:
user_input = input("β€ ")
if user_input.lower() in ['\\q', 'quit']:
print("\nExiting the chat. Goodbye π§ β¨!")
print("β¨" + "=" * 30 + "β¨\n")
break
question = user_input
thoughts = "User is asking a genuine question. Thinking step-by-step in Hinglish."
prompt = alpaca_prompt.format(question=question, thoughts=thoughts, answer="")
chat_history += prompt + "\n"
inputs = tokenizer([chat_history], return_tensors="pt").to("cuda")
outputs = model.generate(
**inputs,
max_new_tokens=256,
temperature=0.7,
top_p=0.9,
num_return_sequences=1,
do_sample=True,
no_repeat_ngram_size=2
)
decoded_output = tokenizer.batch_decode(outputs, skip_special_tokens=True)[0]
clean_output = clean_response(decoded_output)
chat_history += f"{clean_output}\n"
print(f"\nβοΈ: {clean_output}\n")
chat()
π Training details
- Dataset Used: Hinglish-CoT-General
- Total Samples: 2,015 examples
- Training Time: ~49 minutes (on 1 epoch)
- Final Step: 60
- Final Training Loss: 0.776
β οΈ Limitations
- π§ Generalized understanding β may not reflect recent advancements
- The dataset used for finetuning is too short and hence model responses is not as accurate.
π License
This model is licensed under the Apache 2.0 License, same as its base model.
π Citation
@misc{llama3_8b_hinglish_general_2025,
author = {Subh775},
title = {Llama-3.1 8B Hinglish General SFT},
year = {2025},
publisher = {Hugging Face},
howpublished = {\url{https://huggingface.co/Subh775/Llama-3.1-8b-Hinglish-General-sft}},
note = {Hugging Face Repository}
}
- Downloads last month
- 4
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
π
Ask for provider support
Model tree for Subh775/Llama-3.1-8b-Hinglish-General-sft
Base model
unsloth/Meta-Llama-3.1-8B