Triangle104 commited on
Commit
e6529eb
·
verified ·
1 Parent(s): 781a83a

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +153 -0
README.md CHANGED
@@ -17,6 +17,159 @@ language:
17
  This model was converted to GGUF format from [`Spestly/Athena-1-3B`](https://huggingface.co/Spestly/Athena-1-3B) using llama.cpp via the ggml.ai's [GGUF-my-repo](https://huggingface.co/spaces/ggml-org/gguf-my-repo) space.
18
  Refer to the [original model card](https://huggingface.co/Spestly/Athena-1-3B) for more details on the model.
19
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
20
  ## Use with llama.cpp
21
  Install llama.cpp through brew (works on Mac and Linux)
22
 
 
17
  This model was converted to GGUF format from [`Spestly/Athena-1-3B`](https://huggingface.co/Spestly/Athena-1-3B) using llama.cpp via the ggml.ai's [GGUF-my-repo](https://huggingface.co/spaces/ggml-org/gguf-my-repo) space.
18
  Refer to the [original model card](https://huggingface.co/Spestly/Athena-1-3B) for more details on the model.
19
 
20
+ ---
21
+ Model details:
22
+ -
23
+ Athena-1 3B is a fine-tuned, instruction-following large language model derived from Qwen/Qwen2.5-3B-Instruct.
24
+ It is designed to provide efficient, high-quality text generation while
25
+ maintaining a compact size. Athena 3B is optimized for lightweight
26
+ applications, conversational AI, and structured data tasks, making it
27
+ ideal for real-world use cases where performance and resource efficiency
28
+ are critical.
29
+
30
+
31
+
32
+
33
+
34
+
35
+
36
+
37
+ Key Features
38
+
39
+
40
+
41
+
42
+
43
+
44
+
45
+
46
+
47
+ ⚡ Lightweight and Efficient
48
+
49
+
50
+
51
+
52
+ Compact Size: At just 3.09 billion parameters, Athena-1 3B offers excellent performance with reduced computational requirements.
53
+ Instruction Following: Fine-tuned for precise and reliable adherence to user prompts.
54
+ Coding and Mathematics: Proficient in solving coding challenges and handling mathematical tasks.
55
+
56
+
57
+
58
+
59
+
60
+
61
+
62
+ 📖 Long-Context Understanding
63
+
64
+
65
+
66
+
67
+ Context Length: Supports up to 32,768 tokens, enabling the processing of moderately lengthy documents or conversations.
68
+ Token Generation: Can generate up to 8K tokens of output.
69
+
70
+
71
+
72
+
73
+
74
+
75
+
76
+ 🌍 Multilingual Support
77
+
78
+
79
+
80
+
81
+ Supports 29+ languages, including:
82
+ English, Chinese, French, Spanish, Portuguese, German, Italian, Russian
83
+ Japanese, Korean, Vietnamese, Thai, Arabic, and more.
84
+
85
+
86
+
87
+
88
+
89
+
90
+
91
+
92
+
93
+ 📊 Structured Data & Outputs
94
+
95
+
96
+
97
+
98
+ Structured Data Interpretation: Processes structured formats like tables and JSON.
99
+ Structured Output Generation: Generates well-formatted outputs, including JSON and other structured formats.
100
+
101
+
102
+
103
+
104
+
105
+
106
+
107
+
108
+ Model Details
109
+
110
+
111
+
112
+
113
+ Base Model: Qwen/Qwen2.5-3B-Instruct
114
+ Architecture: Transformers with RoPE, SwiGLU, RMSNorm, Attention QKV bias, and tied word embeddings.
115
+ Parameters: 3.09B total (2.77B non-embedding).
116
+ Layers: 36
117
+ Attention Heads: 16 for Q, 2 for KV.
118
+ Context Length: Up to 32,768 tokens.
119
+
120
+
121
+
122
+
123
+
124
+
125
+
126
+
127
+ Applications
128
+
129
+
130
+
131
+
132
+ Athena 3B is designed for a variety of real-world applications:
133
+
134
+
135
+ Conversational AI: Build fast, responsive, and lightweight chatbots.
136
+ Code Generation: Generate, debug, or explain code snippets.
137
+ Mathematical Problem Solving: Assist with calculations and reasoning.
138
+ Document Processing: Summarize and analyze moderately large documents.
139
+ Multilingual Applications: Support for global use cases with diverse language requirements.
140
+ Structured Data: Process and generate structured data, such as tables and JSON.
141
+
142
+
143
+
144
+
145
+
146
+
147
+
148
+
149
+ Quickstart
150
+
151
+
152
+
153
+
154
+ Here’s how you can use Athena 3B for quick text generation:
155
+
156
+
157
+ # Use a pipeline as a high-level helper
158
+ from transformers import pipeline
159
+
160
+ messages = [
161
+ {"role": "user", "content": "Who are you?"},
162
+ ]
163
+ pipe = pipeline("text-generation", model="Spestly/Athena-1-3B")
164
+ pipe(messages)
165
+
166
+ # Load model directly
167
+ from transformers import AutoTokenizer, AutoModelForCausalLM
168
+
169
+ tokenizer = AutoTokenizer.from_pretrained("Spestly/Athena-1-3B")
170
+ model = AutoModelForCausalLM.from_pretrained("Spestly/Athena-1-3B")
171
+
172
+ ---
173
  ## Use with llama.cpp
174
  Install llama.cpp through brew (works on Mac and Linux)
175