🧠 Llama-3.1-8B-Hinglish-General-sft

Llama-3.1-8b-Hinglish-General-sft is a lightweight, domain-specific fine-tuned model built for conversational Hinglish-style reasoning with a focus on general and basic Hinglish knowledge. It builds upon Meta-Llama-3.1-8B and uses LoRA adapters for efficient fine-tuning with Unsloth.

⚠️ This model is a demonstration of supervised fine-tuning and is intended solely for educational and informational purposes. It is not validated for critical applications and should not be used for real-life decision-making.


πŸ“‹ Model Summary


πŸ’‘ Key Features

  • πŸ—£οΈ Hinglish-CoT Reasoning: Trained on ~2K question-answer pairs with step-by-step reasoning in Hinglish.
  • βš™οΈ Efficient Inference: Enabled by LoRA + Unsloth + 4-bit quantization.
  • πŸš€ Fast and Lightweight: Optimized for quick inference even on limited hardware.

πŸ› οΈ Inference Instructions

πŸ”§ Installation

pip install unsloth
from unsloth import FastLanguageModel
import torch

alpaca_prompt = """Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.

### Instruction:
{question}

### Input:
{thoughts}

### Response:
{answer}"""

# Load model
model, tokenizer = FastLanguageModel.from_pretrained(
    model_name="Subh775/Llama-3.1-8b-Hinglish-General-sft",
    max_seq_length=2048,
    load_in_4bit=True
)

FastLanguageModel.for_inference(model)
import re

def clean_response(text):
    if "### Response:" in text:
        text = text.split("### Response:")[-1]
    lines = text.strip().splitlines()
    clean_lines = [line.strip() for line in lines if not re.match(r"^(#|input:|response:|Input:|Response:)", line, re.IGNORECASE)]
    return " ".join(clean_lines).strip()

def chat():
    print("🩺 Chat with Llama-3.1-8b-Hinglish-General-sft! Type '\\q' or 'quit' to stop.\n")
    chat_history = ""

    while True:
        user_input = input("➀ ")
        if user_input.lower() in ['\\q', 'quit']:
            print("\nExiting the chat. Goodbye 🧠✨!")
            print("✨" + "=" * 30 + "✨\n")
            break

        question = user_input
        thoughts = "User is asking a genuine question. Thinking step-by-step in Hinglish."
        prompt = alpaca_prompt.format(question=question, thoughts=thoughts, answer="")
        chat_history += prompt + "\n"

        inputs = tokenizer([chat_history], return_tensors="pt").to("cuda")

        outputs = model.generate(
            **inputs,
            max_new_tokens=256,
            temperature=0.7,
            top_p=0.9,
            num_return_sequences=1,
            do_sample=True,
            no_repeat_ngram_size=2
        )

        decoded_output = tokenizer.batch_decode(outputs, skip_special_tokens=True)[0]
        clean_output = clean_response(decoded_output)
        chat_history += f"{clean_output}\n"

        print(f"\n❄️: {clean_output}\n")

chat()

πŸ“ˆ Training details

  • Dataset Used: Hinglish-CoT-General
  • Total Samples: 2,015 examples
  • Training Time: ~49 minutes (on 1 epoch)
  • Final Step: 60
  • Final Training Loss: 0.776

⚠️ Limitations

  • 🧠 Generalized understanding – may not reflect recent advancements
  • The dataset used for finetuning is too short and hence model responses is not as accurate.

πŸ“œ License

This model is licensed under the Apache 2.0 License, same as its base model.

πŸ“š Citation

@misc{llama3_8b_hinglish_general_2025,
  author       = {Subh775},
  title        = {Llama-3.1 8B Hinglish General SFT},
  year         = {2025},
  publisher    = {Hugging Face},
  howpublished = {\url{https://huggingface.co/Subh775/Llama-3.1-8b-Hinglish-General-sft}},
  note         = {Hugging Face Repository}
}
Downloads last month
4
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for Subh775/Llama-3.1-8b-Hinglish-General-sft

Adapter
(276)
this model

Dataset used to train Subh775/Llama-3.1-8b-Hinglish-General-sft