You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

By accessing this model, you agree to comply with ethical usage guidelines and accept full responsibility for its applications. You will not use this model for harmful, malicious, or illegal activities, and you understand that the model's use is subject to ongoing monitoring for misuse. This model is provided 'AS IS' and agreeing to this means that you are responsible for all the outputs generated by you

Log in or Sign Up to review the conditions and access this model content.

Athena-R3
🚀 Athena-R3: Think Deeper. Solve Smarter. 🤔

## Model Overview **Athena-R3-7B** is a 7-billion-parameter causal language model fine-tuned from DeepSeek-R1-Distill-Qwen-7B. This model is specifically tailored to enhance reasoning capabilities, making it adept at handling complex problem-solving tasks and providing coherent, contextually relevant responses.

Model Details

  • Model Developer: Aayan Mishra
  • Model Type: Causal Language Model
  • Architecture: Transformer with Rotary Position Embeddings (RoPE), SwiGLU activation, RMSNorm, and Attention QKV bias
  • Parameters: 7 billion total
  • Layers: 32
  • Attention Heads: 24 for query and 4 for key-value (Grouped Query Attention)
  • Vocabulary Size: Approximately 151,646 tokens
  • Context Length: Supports up to 128,000 tokens
  • Languages Supported: Primarily English, with capabilities in other languages
  • License: MIT

Training Details

Athena-R3-7B was fine-tuned using the Unsloth framework on a single NVIDIA A100 GPU. The fine-tuning process involved 60 epochs over approximately 90 minutes, utilizing a curated dataset focused on reasoning tasks, including mathematical problem-solving and logical inference. This approach aimed to bolster the model's proficiency in complex reasoning and analytical tasks.

Intended Use

Athena-R3-7B is designed for a variety of applications, including but not limited to:

  • Advanced Reasoning: Assisting with complex problem-solving and logical analysis.
  • Academic Support: Providing explanations and solutions for mathematical and scientific queries.
  • General NLP Tasks: Engaging in text completion, summarization, and question-answering tasks.
  • Data Interpretation: Offering insights and explanations for data-centric inquiries.

While Athena-R3-7B is a powerful tool for various applications, it is not intended for real-time, safety-critical systems or for processing sensitive personal information.

How to Use

To utilize Athena-R3-7B, ensure that you have the latest version of the transformers library installed:

pip install transformers

Here's an example of how to load the Athena-R3-7B model and generate a response:

from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "Spestly/Athena-R3-7B"
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype="auto",
    device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained(model_name)

prompt = "Explain the concept of entropy in thermodynamics."
messages = [
    {"role": "system", "content": "You are Athena, an AI assistant designed to be helpful."},
    {"role": "user", "content": prompt}
]

text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)

model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
generated_ids = model.generate(
    **model_inputs,
    max_new_tokens=512
)
generated_ids = [
    output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]
response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
print(response)

Limitations

Users should be aware of the following limitations:

  • Biases: Athena-R3-7B may exhibit biases present in its training data. Users should critically assess outputs, especially in sensitive contexts.
  • Knowledge Cutoff: The model's knowledge is current up to August 2024. It may not be aware of events or developments occurring after this date.
  • Language Support: While the model supports multiple languages, performance is strongest in English.

Acknowledgements

Athena-R3-7B builds upon the work of the DeepSeek team, particularly the DeepSeek-R1-Distill-Qwen-7B model. Gratitude is also extended to the open-source AI community for their contributions to tools and frameworks that facilitated the development of Athena-R3-7B.

License

Athena-R3-7B is released under the MIT License, permitting wide usage with proper attribution.

Contact

Downloads last month
16
Safetensors
Model size
7.62B params
Tensor type
FP16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Spestly/Athena-R3-7B

Finetuned
(95)
this model
Quantizations
3 models

Space using Spestly/Athena-R3-7B 1

Collection including Spestly/Athena-R3-7B