Files changed (1) hide show
  1. README.md +96 -84
README.md CHANGED
@@ -1,85 +1,97 @@
1
- ---
2
- base_model: Qwen/Qwen2.5-7B-Instruct
3
- tags:
4
- - text-generation-inference
5
- - transformers
6
- - unsloth
7
- - qwen2
8
- - trl
9
- license: apache-2.0
10
- language:
11
- - en
12
- ---
13
- ![Header](https://raw.githubusercontent.com/Aayan-Mishra/Images/refs/heads/main/Athena.png)
14
-
15
- # Athena-1: Lightweight and Powerful Instruction-Following Model
16
-
17
- Athena-1 is a fine-tuned, instruction-following large language model derived from [Qwen/Qwen2.5-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-7B-Instruct). Designed to balance efficiency and performance, Athena 7B provides powerful text-generation capabilities, making it suitable for a variety of real-world applications, including conversational AI, content creation, and structured data processing.
18
-
19
- ---
20
-
21
- ## Key Features
22
-
23
- ### 🚀 Enhanced Performance
24
- - **Instruction Following**: Fine-tuned for excellent adherence to user prompts and instructions.
25
- - **Coding and Mathematics**: Proficient in solving coding problems and mathematical reasoning.
26
- - **Lightweight**: At 7.62 billion parameters, Athena-1-7B offers powerful performance while maintaining efficiency.
27
-
28
- ### 📖 Long-Context Understanding
29
- - **Context Length**: Supports up to **128K tokens**, ensuring accurate handling of large documents or conversations.
30
- - **Token Generation**: Can generate up to **8K tokens** of output.
31
-
32
- ### 🌍 Multilingual Support
33
- - Supports **29+ languages**, including:
34
- - English, Chinese, French, Spanish, Portuguese, German, Italian, Russian
35
- - Japanese, Korean, Vietnamese, Thai, Arabic, and more.
36
-
37
- ### 📊 Structured Data & Outputs
38
- - **Structured Data Interpretation**: Understands and processes structured formats like tables and JSON.
39
- - **Structured Output Generation**: Generates well-formatted outputs, including JSON and other structured formats.
40
-
41
- ---
42
-
43
- ## Model Details
44
-
45
- - **Base Model**: [Qwen/Qwen2.5-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-7B-Instruct)
46
- - **Architecture**: Transformers with RoPE, SwiGLU, RMSNorm, and Attention QKV bias.
47
- - **Parameters**: 7.62B total (6.53B non-embedding).
48
- - **Layers**: 28
49
- - **Attention Heads**: 28 for Q, 4 for KV.
50
- - **Context Length**: Up to **131,072 tokens**.
51
-
52
- ---
53
-
54
- ## Applications
55
-
56
- Athena-1 is designed for a broad range of use cases:
57
- - **Conversational AI**: Create natural, human-like chatbot experiences.
58
- - **Code Generation**: Generate, debug, or explain code snippets.
59
- - **Mathematical Problem Solving**: Assist with complex calculations and reasoning.
60
- - **Document Processing**: Summarize or analyze large documents.
61
- - **Multilingual Applications**: Support for diverse languages for translation and global use cases.
62
- - **Structured Data**: Process and generate structured data, including tables and JSON.
63
-
64
- ---
65
-
66
- ## Quickstart
67
-
68
- Here’s how you can use Athena 7B for quick text generation:
69
-
70
- ```python
71
- # Use a pipeline as a high-level helper
72
- from transformers import pipeline
73
-
74
- messages = [
75
- {"role": "user", "content": "Who are you?"},
76
- ]
77
- pipe = pipeline("text-generation", model="Spestly/Athena-1-7B")
78
- pipe(messages)
79
-
80
- # Load model directly
81
- from transformers import AutoTokenizer, AutoModelForCausalLM
82
-
83
- tokenizer = AutoTokenizer.from_pretrained("Spestly/Athena-1-7B")
84
- model = AutoModelForCausalLM.from_pretrained("Spestly/Athena-1-7B")
 
 
 
 
 
 
 
 
 
 
 
 
85
  ```
 
1
+ ---
2
+ base_model: Qwen/Qwen2.5-7B-Instruct
3
+ tags:
4
+ - text-generation-inference
5
+ - transformers
6
+ - unsloth
7
+ - qwen2
8
+ - trl
9
+ license: apache-2.0
10
+ language:
11
+ - zho
12
+ - eng
13
+ - fra
14
+ - spa
15
+ - por
16
+ - deu
17
+ - ita
18
+ - rus
19
+ - jpn
20
+ - kor
21
+ - vie
22
+ - tha
23
+ - ara
24
+ ---
25
+ ![Header](https://raw.githubusercontent.com/Aayan-Mishra/Images/refs/heads/main/Athena.png)
26
+
27
+ # Athena-1: Lightweight and Powerful Instruction-Following Model
28
+
29
+ Athena-1 is a fine-tuned, instruction-following large language model derived from [Qwen/Qwen2.5-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-7B-Instruct). Designed to balance efficiency and performance, Athena 7B provides powerful text-generation capabilities, making it suitable for a variety of real-world applications, including conversational AI, content creation, and structured data processing.
30
+
31
+ ---
32
+
33
+ ## Key Features
34
+
35
+ ### 🚀 Enhanced Performance
36
+ - **Instruction Following**: Fine-tuned for excellent adherence to user prompts and instructions.
37
+ - **Coding and Mathematics**: Proficient in solving coding problems and mathematical reasoning.
38
+ - **Lightweight**: At 7.62 billion parameters, Athena-1-7B offers powerful performance while maintaining efficiency.
39
+
40
+ ### 📖 Long-Context Understanding
41
+ - **Context Length**: Supports up to **128K tokens**, ensuring accurate handling of large documents or conversations.
42
+ - **Token Generation**: Can generate up to **8K tokens** of output.
43
+
44
+ ### 🌍 Multilingual Support
45
+ - Supports **29+ languages**, including:
46
+ - English, Chinese, French, Spanish, Portuguese, German, Italian, Russian
47
+ - Japanese, Korean, Vietnamese, Thai, Arabic, and more.
48
+
49
+ ### 📊 Structured Data & Outputs
50
+ - **Structured Data Interpretation**: Understands and processes structured formats like tables and JSON.
51
+ - **Structured Output Generation**: Generates well-formatted outputs, including JSON and other structured formats.
52
+
53
+ ---
54
+
55
+ ## Model Details
56
+
57
+ - **Base Model**: [Qwen/Qwen2.5-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-7B-Instruct)
58
+ - **Architecture**: Transformers with RoPE, SwiGLU, RMSNorm, and Attention QKV bias.
59
+ - **Parameters**: 7.62B total (6.53B non-embedding).
60
+ - **Layers**: 28
61
+ - **Attention Heads**: 28 for Q, 4 for KV.
62
+ - **Context Length**: Up to **131,072 tokens**.
63
+
64
+ ---
65
+
66
+ ## Applications
67
+
68
+ Athena-1 is designed for a broad range of use cases:
69
+ - **Conversational AI**: Create natural, human-like chatbot experiences.
70
+ - **Code Generation**: Generate, debug, or explain code snippets.
71
+ - **Mathematical Problem Solving**: Assist with complex calculations and reasoning.
72
+ - **Document Processing**: Summarize or analyze large documents.
73
+ - **Multilingual Applications**: Support for diverse languages for translation and global use cases.
74
+ - **Structured Data**: Process and generate structured data, including tables and JSON.
75
+
76
+ ---
77
+
78
+ ## Quickstart
79
+
80
+ Here’s how you can use Athena 7B for quick text generation:
81
+
82
+ ```python
83
+ # Use a pipeline as a high-level helper
84
+ from transformers import pipeline
85
+
86
+ messages = [
87
+ {"role": "user", "content": "Who are you?"},
88
+ ]
89
+ pipe = pipeline("text-generation", model="Spestly/Athena-1-7B")
90
+ pipe(messages)
91
+
92
+ # Load model directly
93
+ from transformers import AutoTokenizer, AutoModelForCausalLM
94
+
95
+ tokenizer = AutoTokenizer.from_pretrained("Spestly/Athena-1-7B")
96
+ model = AutoModelForCausalLM.from_pretrained("Spestly/Athena-1-7B")
97
  ```