KunoRZN-Llama-3-3B

Model Description

KunoRZN-Llama-3-3B (Knowledge Understanding Network with Optimized Reasoning Zone Navigation) is VinkuraAI's flagship language model designed to support 12+ Indian languages alongside English. This hybrid reasoning model unifies both "intuitive" traditional mode responses and long chain of thought reasoning responses into a single model, toggled by a system prompt.

Built with Meta Llama 3, KunoRZN excels at supporting:

Educational applications across various curricula and languages
Healthcare information delivery in regional languages
Traffic management systems adapted to local conditions
Low-resource computing environments common across India

The ethos of KunoRZN is focused on providing accessible AI capabilities in multiple Indian languages with powerful steering capabilities and control given to the end user.

This model has been fine-tuned on diverse multilingual datasets representing Indian contexts, languages, and use cases. Our goal is to make advanced AI accessible to the broader Indian population by overcoming language barriers.

Note: To toggle REASONING ON, use the following system prompt:

You are a deep thinking AI assistant who can communicate in multiple Indian languages. You may use extremely long chains of thought to deeply consider the problem and deliberate with yourself via systematic reasoning processes to help come to a correct solution prior to answering. You should enclose your thoughts and internal monologue inside <thinking> </thinking> tags, and then provide your solution or response to the problem.

Multilingual Capabilities

KunoRZN-Llama-3-3B supports the following languages:

English
Hindi (हिन्दी)
Tamil (தமிழ்)
Telugu (తెలుగు)
Kannada (ಕನ್ನಡ)
Malayalam (മലയാളം)
Marathi (मराठी)
Bengali (বাংলা)
Gujarati (ગુજરાતી)
Punjabi (ਪੰਜਾਬੀ)
Odia (ଓଡ଼ିଆ)
Assamese (অসমীয়া)
Urdu (اردو)

Benchmarks

Reasoning Benchmarks, with Reasoning ON and OFF:

Prompt Format

KunoRZN-Llama-3-3B uses Llama-Chat format as the prompt format, providing a unified, structured system for engaging the LLM in multi-turn chat dialogue.

System prompts allow steerability and interesting new ways to interact with the LLM, guiding rules, roles, and stylistic choices of the model.

Deep Thinking Mode - KunoRZN can activate long chain of thought with a system prompt.

You are a deep thinking AI assistant who can communicate in multiple Indian languages. You may use extremely long chains of thought to deeply consider the problem and deliberate with yourself via systematic reasoning processes to help come to a correct solution prior to answering. You should enclose your thoughts and internal monologue inside <thinking> </thinking> tags, and then provide your solution or response to the problem.

Example of using deep reasoning mode with HuggingFace Transformers:

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM
import flash_attn
import time

tokenizer = AutoTokenizer.from_pretrained("VinkuraAI/KunoRZN-Llama-3-3B")

model = AutoModelForCausalLM.from_pretrained(
    "VinkuraAI/KunoRZN-Llama-3-3B",
    torch_dtype=torch.float16,
    device_map="auto",
    attn_implementation="flash_attention_2",
)

messages = [
    {
        "role": "system",
        "content": "You are a deep thinking AI assistant who can communicate in multiple Indian languages. You may use extremely long chains of thought to deeply consider the problem and deliberate with yourself via systematic reasoning processes to help come to a correct solution prior to answering. You should enclose your thoughts and internal monologue inside <thinking> </thinking> tags, and then provide your solution or response to the problem."
    },
    {
        "role": "user",
        "content": "भारतीय संविधान के मूल अधिकारों के बारे में बताइए और उनका महत्व समझाइए।"
    }
]

input_ids = tokenizer.apply_chat_template(messages, tokenize=True, add_generation_prompt=True, return_tensors='pt').to("cuda")
generated_ids = model.generate(input_ids, max_new_tokens=3000, temperature=0.8, repetition_penalty=1.1, do_sample=True, eos_token_id=tokenizer.eos_token_id)
print(f"Generated Tokens: {generated_ids.shape[-1:]}")
response = tokenizer.decode(generated_ids[0], skip_special_tokens=True, clean_up_tokenization_space=True)
print(f"Response: {response}")

Please note, for complex reasoning tasks, KunoRZN may use up to 10,000 tokens in its thinking process. You may need to increase max_new_tokens for difficult problems.

Standard "Intuitive" Response Mode

Prompt with system instruction (Use whatever system prompt you like, this is just an example!):

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM
import flash_attn
import time

tokenizer = AutoTokenizer.from_pretrained("VinkuraAI/KunoRZN-Llama-3-3B")

model = AutoModelForCausalLM.from_pretrained(
    "VinkuraAI/KunoRZN-Llama-3-3B",
    torch_dtype=torch.float16,
    device_map="auto",
    attn_implementation="flash_attention_2",
)

messages = [
    {
        "role": "system",
        "content": "You are KunoRZN, a multilingual AI assistant fluent in both English and Indian languages."
    },
    {
        "role": "user",
        "content": "தமிழ்நாட்டில் உள்ள பிரபலமான சுற்றுலா தலங்கள் என்ன?"
    }
]

input_ids = tokenizer.apply_chat_template(messages, tokenize=True, add_generation_prompt=True, return_tensors='pt').to("cuda")
generated_ids = model.generate(input_ids, max_new_tokens=2500, temperature=0.8, repetition_penalty=1.1, do_sample=True, eos_token_id=tokenizer.eos_token_id)
print(f"Generated Tokens: {generated_ids.shape[-1:]}")
response = tokenizer.decode(generated_ids[0], skip_special_tokens=True, clean_up_tokenization_space=True)
print(f"Response: {response}")

VLLM Inference

You can also run this model with vLLM, by running the following in your terminal after pip install vllm

vllm serve VinkuraAI/KunoRZN-Llama-3-3B

You may then use the model over API using the OpenAI library just like you would call OpenAI's API.

Example Multilingual Use Cases

Education

messages = [
    {
        "role": "system",
        "content": "You are KunoRZN, an educational AI assistant that can explain concepts in simple terms for students."
    },
    {
        "role": "user",
        "content": "ಸೌರವ್ಯೂಹದ ಗ್ರಹಗಳ ಬಗ್ಗೆ ವಿವರಿಸಿ."
    }
]

Healthcare

messages = [
    {
        "role": "system",
        "content": "You are KunoRZN, a healthcare information assistant. Provide general health information while always recommending consultation with healthcare professionals."
    },
    {
        "role": "user",
        "content": "ডায়াবেটিস রোগের লক্ষণগুলি কী কী?"
    }
]

Traffic Management

messages = [
    {
        "role": "system",
        "content": "You are KunoRZN, a traffic management assistant. Help users navigate local traffic conditions and understand traffic rules."
    },
    {
        "role": "user",
        "content": "मुंबई में ट्रैफिक जाम से बचने के लिए क्या सुझाव हैं?"
    }
]

Function Calling

Our model was trained on specific system prompts and structures for Function Calling.

You should use the system role with this message, followed by a function signature json as this example shows:

<|start_header_id|>system<|end_header_id|>
You are a function calling AI model fluent in multiple Indian languages. You are provided with function signatures within <tools></tools> XML tags. You may call one or more functions to assist with the user query. Don't make assumptions about what values to plug into functions. Here are the available tools: <tools> {"type": "function", "function": {"name": "get_weather", "description": "get_weather(city: str, state: str, country: str='India') -> dict - Get weather information for a given city.\\n\\n    Args:\\n        city (str): The city name.\\n        state (str): The state name.\\n        country (str): The country name, defaults to India.\\n\\n    Returns:\\n        dict: A dictionary containing weather information.\\n            Keys:\\n                - \'city\': The city name.\\n                - \'state\': The state name.\\n                - \'temperature\': The current temperature in Celsius.\\n                - \'humidity\': The current humidity percentage.\\n                - \'description\': Weather description.\\n                - \'forecast\': Forecast for next 3 days.", "parameters": {"type": "object", "properties": {"city": {"type": "string"}, "state": {"type": "string"}, "country": {"type": "string"}}, "required": ["city", "state"]}}}  </tools> Use the following pydantic model json schema for each tool call you will make: {"properties": {"arguments": {"title": "Arguments", "type": "object"}, "name": {"title": "Name", "type": "string"}}, "required": ["arguments", "name"], "title": "FunctionCall", "type": "object"} For each function call return a json object with function name and arguments within <tool_call></tool_call> XML tags as follows:
<tool_call>
{"arguments": <args-dict>, "name": <function-name>}
</tool_call><|eot_id|><|start_header_id|>user<|end_header_id|>

Quantized Versions

GGUF Quants: https://huggingface.co/VinkuraAI/KunoRZN-Llama-3-3B-GGUF

License

Contact and Support

For more information and support, visit vinkura.in or contact us at [email protected]

How to cite:

@misc{
      title={KunoRZN-Llama-3-3B}, 
      author={VinkuraAI},
      year={2025}
}

VinkuraAI
/

KunoRZN-Llama-3-3B