prithivMLmods
commited on
Update README.md
Browse files
README.md
CHANGED
@@ -12,4 +12,60 @@ tags:
|
|
12 |
|
13 |
# **GWQ-9B-Preview**
|
14 |
|
15 |
-
GWQ - Gemma with Questions Prev is a family of lightweight, state-of-the-art open models from Google, built using the same research and technology employed to create the Gemini models. These models are text-to-text, decoder-only large language models, available in English, with open weights for both pre-trained and instruction-tuned variants. Gemma models are well-suited for a variety of text generation tasks, including question answering, summarization, and reasoning. GWQ is fine-tuned on the Chain of Continuous Thought Synthetic Dataset, built upon the Gemma2forCasualLM architecture.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
12 |
|
13 |
# **GWQ-9B-Preview**
|
14 |
|
15 |
+
GWQ - Gemma with Questions Prev is a family of lightweight, state-of-the-art open models from Google, built using the same research and technology employed to create the Gemini models. These models are text-to-text, decoder-only large language models, available in English, with open weights for both pre-trained and instruction-tuned variants. Gemma models are well-suited for a variety of text generation tasks, including question answering, summarization, and reasoning. GWQ is fine-tuned on the Chain of Continuous Thought Synthetic Dataset, built upon the Gemma2forCasualLM architecture.
|
16 |
+
|
17 |
+
# **Running GWQ Demo**
|
18 |
+
|
19 |
+
```python
|
20 |
+
# pip install accelerate
|
21 |
+
from transformers import AutoTokenizer, AutoModelForCausalLM
|
22 |
+
import torch
|
23 |
+
|
24 |
+
tokenizer = AutoTokenizer.from_pretrained("google/gemma-2-9b-it")
|
25 |
+
model = AutoModelForCausalLM.from_pretrained(
|
26 |
+
"prithivMLmods/GWQ-9B-Preview",
|
27 |
+
device_map="auto",
|
28 |
+
torch_dtype=torch.bfloat16,
|
29 |
+
)
|
30 |
+
|
31 |
+
input_text = "Write me a poem about Machine Learning."
|
32 |
+
input_ids = tokenizer(input_text, return_tensors="pt").to("cuda")
|
33 |
+
|
34 |
+
outputs = model.generate(**input_ids, max_new_tokens=32)
|
35 |
+
print(tokenizer.decode(outputs[0]))
|
36 |
+
```
|
37 |
+
|
38 |
+
You can ensure the correct chat template is applied by using `tokenizer.apply_chat_template` as follows:
|
39 |
+
```python
|
40 |
+
messages = [
|
41 |
+
{"role": "user", "content": "Write me a poem about Machine Learning."},
|
42 |
+
]
|
43 |
+
input_ids = tokenizer.apply_chat_template(messages, return_tensors="pt", return_dict=True).to("cuda")
|
44 |
+
|
45 |
+
outputs = model.generate(**input_ids, max_new_tokens=256)
|
46 |
+
print(tokenizer.decode(outputs[0]))
|
47 |
+
```
|
48 |
+
|
49 |
+
# **Key Architecture**
|
50 |
+
|
51 |
+
1. **Transformer-Based Design**:
|
52 |
+
Gemma 2 leverages the transformer architecture, utilizing self-attention mechanisms to process input text and capture contextual relationships effectively.
|
53 |
+
|
54 |
+
2. **Lightweight and Efficient**:
|
55 |
+
It is designed to be computationally efficient, with fewer parameters compared to larger models, making it ideal for deployment on resource-constrained devices or environments.
|
56 |
+
|
57 |
+
3. **Modular Layers**:
|
58 |
+
The architecture consists of modular encoder and decoder layers, allowing flexibility in adapting the model for specific tasks like text generation, summarization, or classification.
|
59 |
+
|
60 |
+
4. **Attention Mechanisms**:
|
61 |
+
Gemma 2 employs multi-head self-attention to focus on relevant parts of the input text, improving its ability to handle long-range dependencies and complex language structures.
|
62 |
+
|
63 |
+
5. **Pre-training and Fine-Tuning**:
|
64 |
+
The model is pre-trained on large text corpora and can be fine-tuned for specific tasks, such as markdown processing in ReadM.Md, to enhance its performance on domain-specific data.
|
65 |
+
|
66 |
+
6. **Scalability**:
|
67 |
+
The architecture supports scaling up or down based on the application's requirements, balancing performance and resource usage.
|
68 |
+
|
69 |
+
7. **Open-Source and Customizable**:
|
70 |
+
Being open-source, Gemma 2 allows developers to modify and extend its architecture to suit specific use cases, such as integrating it into tools like ReadM.Md for markdown-related tasks.
|
71 |
+
|