Triangle104 commited on
Commit
0684c22
·
verified ·
1 Parent(s): bda7d39

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +152 -0
README.md CHANGED
@@ -17,6 +17,158 @@ language:
17
  This model was converted to GGUF format from [`Spestly/Athena-1-1.5B`](https://huggingface.co/Spestly/Athena-1-1.5B) using llama.cpp via the ggml.ai's [GGUF-my-repo](https://huggingface.co/spaces/ggml-org/gguf-my-repo) space.
18
  Refer to the [original model card](https://huggingface.co/Spestly/Athena-1-1.5B) for more details on the model.
19
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
20
  ## Use with llama.cpp
21
  Install llama.cpp through brew (works on Mac and Linux)
22
 
 
17
  This model was converted to GGUF format from [`Spestly/Athena-1-1.5B`](https://huggingface.co/Spestly/Athena-1-1.5B) using llama.cpp via the ggml.ai's [GGUF-my-repo](https://huggingface.co/spaces/ggml-org/gguf-my-repo) space.
18
  Refer to the [original model card](https://huggingface.co/Spestly/Athena-1-1.5B) for more details on the model.
19
 
20
+ ---
21
+ Model details:
22
+ -
23
+ Athena-1 1.5B is a fine-tuned, instruction-following large language model derived from Qwen/Qwen2.5-1.5B-Instruct.
24
+ Designed for efficiency and high-quality text generation, Athena-1 1.5B
25
+ maintains a compact size, making it ideal for real-world applications
26
+ where performance and resource efficiency are critical, such as
27
+ lightweight applications, conversational AI, and structured data tasks.
28
+
29
+
30
+
31
+
32
+
33
+
34
+
35
+
36
+ Key Features
37
+
38
+
39
+
40
+
41
+
42
+
43
+
44
+
45
+
46
+ ⚡ Lightweight and Efficient
47
+
48
+
49
+
50
+
51
+ Compact Size: At just 1.5 billion parameters, Athena-1 1.5B offers excellent performance with reduced computational requirements.
52
+ Instruction Following: Fine-tuned for precise and reliable adherence to user prompts.
53
+ Coding and Mathematics: Proficient in solving coding challenges and handling mathematical tasks.
54
+
55
+
56
+
57
+
58
+
59
+
60
+
61
+ 📖 Long-Context Understanding
62
+
63
+
64
+
65
+
66
+ Context Length: Supports up to 32,768 tokens, enabling the processing of moderately lengthy documents or conversations.
67
+ Token Generation: Can generate up to 8K tokens of output.
68
+
69
+
70
+
71
+
72
+
73
+
74
+
75
+ 🌍 Multilingual Support
76
+
77
+
78
+
79
+
80
+ Supports 29+ languages, including:
81
+ English, Chinese, French, Spanish, Portuguese, German, Italian, Russian
82
+ Japanese, Korean, Vietnamese, Thai, Arabic, and more.
83
+
84
+
85
+
86
+
87
+
88
+
89
+
90
+
91
+
92
+ 📊 Structured Data & Outputs
93
+
94
+
95
+
96
+
97
+ Structured Data Interpretation: Processes structured formats like tables and JSON.
98
+ Structured Output Generation: Generates well-formatted outputs, including JSON and other structured formats.
99
+
100
+
101
+
102
+
103
+
104
+
105
+
106
+
107
+ Model Details
108
+
109
+
110
+
111
+
112
+ Base Model: Qwen/Qwen2.5-1.5B-Instruct
113
+ Architecture: Transformers with RoPE, SwiGLU, RMSNorm, Attention QKV bias, and tied word embeddings.
114
+ Parameters: 1.5B total (Adjust non-embedding count if you have it).
115
+ Layers: (Adjust if different from the 3B model)
116
+ Attention Heads: (Adjust if different from the 3B model)
117
+ Context Length: Up to 32,768 tokens.
118
+
119
+
120
+
121
+
122
+
123
+
124
+
125
+
126
+ Applications
127
+
128
+
129
+
130
+
131
+ Athena 1.5B is designed for a variety of real-world applications:
132
+
133
+
134
+ Conversational AI: Build fast, responsive, and lightweight chatbots.
135
+ Code Generation: Generate, debug, or explain code snippets.
136
+ Mathematical Problem Solving: Assist with calculations and reasoning.
137
+ Document Processing: Summarize and analyze moderately large documents.
138
+ Multilingual Applications: Support for global use cases with diverse language requirements.
139
+ Structured Data: Process and generate structured data, such as tables and JSON.
140
+
141
+
142
+
143
+
144
+
145
+
146
+
147
+
148
+ Quickstart
149
+
150
+
151
+
152
+
153
+ Here’s how you can use Athena 1.5B for quick text generation:
154
+
155
+
156
+ # Use a pipeline as a high-level helper
157
+ from transformers import pipeline
158
+
159
+ messages = [
160
+ {"role": "user", "content": "Who are you?"},
161
+ ]
162
+ pipe = pipeline("text-generation", model="Spestly/Athena-1-1.5B") # Update model name
163
+ print(pipe(messages))
164
+
165
+ # Load model directly
166
+ from transformers import AutoTokenizer, AutoModelForCausalLM
167
+
168
+ tokenizer = AutoTokenizer.from_pretrained("Spestly/Athena-1-1.5B") # Update model name
169
+ model = AutoModelForCausalLM.from_pretrained("Spestly/Athena-1-1.5B") # Update model name
170
+
171
+ ---
172
  ## Use with llama.cpp
173
  Install llama.cpp through brew (works on Mac and Linux)
174