🧠 Llama-3.1-8B-Hinglish-General-sft

Llama-3.1-8b-Hinglish-General-sft is a lightweight, domain-specific fine-tuned model built for conversational Hinglish-style reasoning with a focus on general and basic Hinglish knowledge. It builds upon Meta-Llama-3.1-8B and uses LoRA adapters for efficient fine-tuning with Unsloth.

⚠️ This model is a demonstration of supervised fine-tuning and is intended solely for educational and informational purposes. It is not validated for critical applications and should not be used for real-life decision-making.

📋 Model Summary

Base Model: unsloth/Meta-Llama-3.1-8B
LoRA Adapter: Subh775/Llama-3.1-8b-Hinglish-General-sft
Fine-tuned Dataset: fhai50032/Hinglish-CoT-General
Language: Hinglish (Hindi-English mix)
Training Time: 49.24 minutes (1 epoch)
Framework: Unsloth
Quantization: 4-bit (for efficient inference)

💡 Key Features

🗣️ Hinglish-CoT Reasoning: Trained on ~2K question-answer pairs with step-by-step reasoning in Hinglish.
⚙️ Efficient Inference: Enabled by LoRA + Unsloth + 4-bit quantization.
🚀 Fast and Lightweight: Optimized for quick inference even on limited hardware.

🛠️ Inference Instructions

🔧 Installation

pip install unsloth

from unsloth import FastLanguageModel
import torch

alpaca_prompt = """Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.

### Instruction:
{question}

### Input:
{thoughts}

### Response:
{answer}"""

# Load model
model, tokenizer = FastLanguageModel.from_pretrained(
    model_name="Subh775/Llama-3.1-8b-Hinglish-General-sft",
    max_seq_length=2048,
    load_in_4bit=True
)

FastLanguageModel.for_inference(model)

import re

def clean_response(text):
    if "### Response:" in text:
        text = text.split("### Response:")[-1]
    lines = text.strip().splitlines()
    clean_lines = [line.strip() for line in lines if not re.match(r"^(#|input:|response:|Input:|Response:)", line, re.IGNORECASE)]
    return " ".join(clean_lines).strip()

def chat():
    print("🩺 Chat with Llama-3.1-8b-Hinglish-General-sft! Type '\\q' or 'quit' to stop.\n")
    chat_history = ""

    while True:
        user_input = input("➤ ")
        if user_input.lower() in ['\\q', 'quit']:
            print("\nExiting the chat. Goodbye 🧠✨!")
            print("✨" + "=" * 30 + "✨\n")
            break

        question = user_input
        thoughts = "User is asking a genuine question. Thinking step-by-step in Hinglish."
        prompt = alpaca_prompt.format(question=question, thoughts=thoughts, answer="")
        chat_history += prompt + "\n"

        inputs = tokenizer([chat_history], return_tensors="pt").to("cuda")

        outputs = model.generate(
            **inputs,
            max_new_tokens=256,
            temperature=0.7,
            top_p=0.9,
            num_return_sequences=1,
            do_sample=True,
            no_repeat_ngram_size=2
        )

        decoded_output = tokenizer.batch_decode(outputs, skip_special_tokens=True)[0]
        clean_output = clean_response(decoded_output)
        chat_history += f"{clean_output}\n"

        print(f"\n❄️: {clean_output}\n")

chat()

📈 Training details

Dataset Used: Hinglish-CoT-General
Total Samples: 2,015 examples
Training Time: ~49 minutes (on 1 epoch)
Final Step: 60
Final Training Loss: 0.776

⚠️ Limitations

🧠 Generalized understanding – may not reflect recent advancements
The dataset used for finetuning is too short and hence model responses is not as accurate.

📜 License

This model is licensed under the Apache 2.0 License, same as its base model.

📚 Citation

@misc{llama3_8b_hinglish_general_2025,
  author       = {Subh775},
  title        = {Llama-3.1 8B Hinglish General SFT},
  year         = {2025},
  publisher    = {Hugging Face},
  howpublished = {\url{https://huggingface.co/Subh775/Llama-3.1-8b-Hinglish-General-sft}},
  note         = {Hugging Face Repository}
}

Subh775
/

Llama-3.1-8b-Hinglish-General-sft