Upload folder using huggingface_hub
Browse files- README.md +65 -146
- config.json +1 -1
- pytorch_model.bin +1 -1
README.md
CHANGED
@@ -1,156 +1,75 @@
|
|
1 |
---
|
2 |
tags:
|
3 |
-
-
|
|
|
4 |
model-index:
|
5 |
-
- name:
|
6 |
results: []
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
7 |
---
|
8 |
|
9 |
-
|
10 |
-
should probably proofread and complete it, then remove this comment. -->
|
11 |
|
12 |
-
|
13 |
-
<details><summary>See axolotl config</summary>
|
14 |
|
15 |
-
|
16 |
-
|
17 |
-
|
18 |
-
|
19 |
-
tokenizer_type: AutoTokenizer
|
20 |
|
21 |
-
|
22 |
-
|
23 |
-
strict: false
|
24 |
|
25 |
-
|
26 |
-
|
27 |
-
|
28 |
-
|
29 |
-
|
30 |
-
|
31 |
-
|
32 |
-
|
33 |
-
|
34 |
-
|
35 |
-
|
36 |
-
|
37 |
-
|
38 |
-
|
39 |
-
|
40 |
-
|
41 |
-
|
42 |
-
|
43 |
-
|
44 |
-
|
45 |
-
|
46 |
-
|
47 |
-
|
48 |
-
|
49 |
-
|
50 |
-
|
51 |
-
|
52 |
-
|
53 |
-
|
54 |
-
|
55 |
-
|
56 |
-
|
57 |
-
|
58 |
-
|
59 |
-
|
60 |
-
|
61 |
-
|
62 |
-
|
63 |
-
|
64 |
-
|
65 |
-
|
66 |
-
|
67 |
-
|
68 |
-
|
69 |
-
|
70 |
-
|
71 |
-
|
72 |
-
fp16:
|
73 |
-
tf32: false
|
74 |
-
|
75 |
-
gradient_checkpointing: true
|
76 |
-
early_stopping_patience:
|
77 |
-
resume_from_checkpoint:
|
78 |
-
logging_steps: 1
|
79 |
-
xformers_attention:
|
80 |
-
flash_attention: true
|
81 |
-
|
82 |
-
warmup_steps: 100
|
83 |
-
evals_per_epoch: 5
|
84 |
-
eval_table_size:
|
85 |
-
saves_per_epoch: 1
|
86 |
-
debug:
|
87 |
-
deepspeed: /home/005/th5351/packages/axolotl/deepspeed_configs/zero2.json
|
88 |
-
ddp_timeout: 3600000
|
89 |
-
weight_decay: 0.0
|
90 |
-
fsdp:
|
91 |
-
fsdp_config:
|
92 |
-
|
93 |
-
```
|
94 |
-
|
95 |
-
</details><br>
|
96 |
-
|
97 |
-
# home/005/th5351/output
|
98 |
-
|
99 |
-
This model was trained from scratch on the None dataset.
|
100 |
-
It achieves the following results on the evaluation set:
|
101 |
-
- Loss: nan
|
102 |
-
|
103 |
-
## Model description
|
104 |
-
|
105 |
-
More information needed
|
106 |
-
|
107 |
-
## Intended uses & limitations
|
108 |
-
|
109 |
-
More information needed
|
110 |
-
|
111 |
-
## Training and evaluation data
|
112 |
-
|
113 |
-
More information needed
|
114 |
-
|
115 |
-
## Training procedure
|
116 |
-
|
117 |
-
### Training hyperparameters
|
118 |
-
|
119 |
-
The following hyperparameters were used during training:
|
120 |
-
- learning_rate: 5e-05
|
121 |
-
- train_batch_size: 1
|
122 |
-
- eval_batch_size: 1
|
123 |
-
- seed: 42
|
124 |
-
- distributed_type: multi-GPU
|
125 |
-
- num_devices: 4
|
126 |
-
- gradient_accumulation_steps: 8
|
127 |
-
- total_train_batch_size: 32
|
128 |
-
- total_eval_batch_size: 4
|
129 |
-
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
|
130 |
-
- lr_scheduler_type: cosine
|
131 |
-
- lr_scheduler_warmup_steps: 100
|
132 |
-
- num_epochs: 2
|
133 |
-
|
134 |
-
### Training results
|
135 |
-
|
136 |
-
| Training Loss | Epoch | Step | Validation Loss |
|
137 |
-
|:-------------:|:------:|:----:|:---------------:|
|
138 |
-
| 1.3757 | 0.0005 | 1 | nan |
|
139 |
-
| 0.8083 | 0.1999 | 388 | nan |
|
140 |
-
| 0.8005 | 0.3998 | 776 | nan |
|
141 |
-
| 0.7389 | 0.5998 | 1164 | nan |
|
142 |
-
| 0.7269 | 0.7997 | 1552 | nan |
|
143 |
-
| 0.7069 | 0.9996 | 1940 | nan |
|
144 |
-
| 0.5786 | 1.1613 | 2328 | nan |
|
145 |
-
| 0.5385 | 1.3613 | 2716 | nan |
|
146 |
-
| 0.5381 | 1.5612 | 3104 | nan |
|
147 |
-
| 0.5273 | 1.7611 | 3492 | nan |
|
148 |
-
| 0.527 | 1.9610 | 3880 | nan |
|
149 |
-
|
150 |
-
|
151 |
-
### Framework versions
|
152 |
-
|
153 |
-
- Transformers 4.41.1
|
154 |
-
- Pytorch 2.3.0+cu121
|
155 |
-
- Datasets 2.19.1
|
156 |
-
- Tokenizers 0.19.1
|
|
|
1 |
---
|
2 |
tags:
|
3 |
+
- physics
|
4 |
+
- cosmology
|
5 |
model-index:
|
6 |
+
- name: cosmosage-v3
|
7 |
results: []
|
8 |
+
license: mit
|
9 |
+
language:
|
10 |
+
- en
|
11 |
+
pipeline_tag: text-generation
|
12 |
+
base_model: meta-llama/meta-llama-3-8b
|
13 |
+
datasets:
|
14 |
+
- teknium/OpenHermes-2.5
|
15 |
---
|
16 |
|
17 |
+
# cosmosage
|
|
|
18 |
|
19 |
+
cosmosage is a natural-language cosmology assistant that can answer questions about cosmology.
|
|
|
20 |
|
21 |
+
cosmosage-v3 is the latest iteration in the cosmosage series, trained on the LLAMA-3-8B base
|
22 |
+
model. We started with continued pretraining on thousands of papers and textbooks. The next step
|
23 |
+
was fine-tuning on synthetically-generated question-answer pairs. In addition, the OpenHermes 2.5
|
24 |
+
dataset was used to improve instruction following and general conversational capability.
|
|
|
25 |
|
26 |
+
cosmosage-v3 is a full chat model, though it excels in Q&A mode, where the model gives a single
|
27 |
+
answer in response to a single question.
|
|
|
28 |
|
29 |
+
The code used to generate cosmosage is available at https://github.com/tijmen/cosmosage
|
30 |
+
|
31 |
+
## Usage
|
32 |
+
|
33 |
+
cosmosage-v3 uses the Llama-3 prompt template. Sampling parameters are up to you, but I like {'temperature': 0.7, 'smoothing_factor': 1, 'smoothing_curve': 1.5, 'repetition_penalty': 1.1}.
|
34 |
+
|
35 |
+
## Comparison to cosmosage_v2
|
36 |
+
|
37 |
+
cosmosage-v3 was trained on the same data with nearly the same hyperparameters as cosmosage_v2.
|
38 |
+
The primary difference is the use of the LLAMA-3-8B base model rather than the Mistral-v0.1 base
|
39 |
+
model.
|
40 |
+
|
41 |
+
## Training details
|
42 |
+
|
43 |
+
cosmosage-v3 was trained on 4xA100 (40 GB) at Gadi (NCI, Australia). A big thanks goes out to Yuan-Seng Ting for providing these resources.
|
44 |
+
|
45 |
+
## Example output
|
46 |
+
|
47 |
+
**User:**
|
48 |
+
> What is "stacking"?
|
49 |
+
|
50 |
+
**Assistant:**
|
51 |
+
> Stacking is a technique used in astronomy to combine multiple images or data sets of the same object into a single image or dataset. This can be done by aligning and co-adding the individual images or data points, which helps to improve the signal-to-noise ratio and reveal faint features that may not be visible in a single image or dataset. Stacking is often used to study the properties of galaxies, galaxy clusters, and other astronomical objects.
|
52 |
+
|
53 |
+
**User:**
|
54 |
+
> Explain the ISW effect.
|
55 |
+
|
56 |
+
**Assistant:**
|
57 |
+
> The Integrated Sachs-Wolfe (ISW) effect is a phenomenon in modern cosmology that describes the gravitational interaction between cosmic microwave background (CMB) photons and large-scale structures in the universe. It occurs when CMB photons pass through regions of space where the gravitational potential has changed due to the expansion of the universe or the presence of matter overdensities.
|
58 |
+
|
59 |
+
When a photon moves into an area with a higher gravitational potential, it gains energy as it falls into the potential well. As the photon climbs out of this well on the other side, it loses energy. If the change in the gravitational potential is gradual over time, the photon's energy gain and loss will be integrated, leading to a net temperature shift in the CMB radiation. This effect can either cause a blueshift (increased energy) if the gravitational potential is decreasing, or a redshift (decreased energy) if the gravitational potential is increasing.
|
60 |
+
|
61 |
+
The ISW effect is particularly important for understanding the evolution of the universe and the role of dark energy. The detection of the ISW effect has provided evidence for the existence of dark energy and has helped constrain its properties. By analyzing the CMB temperature fluctuations and cross-correlating them with tracers of large-scale structure such as galaxy surveys, researchers can study the ISW effect and learn more about the dynamics of the universe.
|
62 |
+
|
63 |
+
**User:**
|
64 |
+
> What is Digital Active Nulling?
|
65 |
+
|
66 |
+
**Assistant:**
|
67 |
+
> Digital Active Nulling (DAN) is a technique used in the readout of Transition Edge Sensor (TES) bolometers. It works by dynamically nulling the current flowing through the SQUID input coil while digitally demodulating the carrier voltage and the current signal. This method helps to minimize non-linearity effects due to the nonlinear response of the SQUID, improving the overall performance of the system.
|
68 |
+
|
69 |
+
**User:**
|
70 |
+
> Explain why the time of matter-radiation equality affects the CMB damping tail.
|
71 |
+
|
72 |
+
**Assistant:**
|
73 |
+
> The time of matter-radiation equality is when the energy densities of matter and radiation were equal. This has an impact on the Cosmic Microwave Background (CMB) damping tail because it determines the amount of photon diffusion that occurs before decoupling. A later time of matter-radiation equality results in more photon diffusion, leading to a smaller amplitude in the damping tail.
|
74 |
+
|
75 |
+
[<img src="https://raw.githubusercontent.com/OpenAccess-AI-Collective/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/OpenAccess-AI-Collective/axolotl)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
config.json
CHANGED
@@ -1,5 +1,5 @@
|
|
1 |
{
|
2 |
-
"_name_or_path": "/home/005/th5351/models/cosmosage-llama3-8b
|
3 |
"architectures": [
|
4 |
"LlamaForCausalLM"
|
5 |
],
|
|
|
1 |
{
|
2 |
+
"_name_or_path": "/home/005/th5351/models/cosmosage-llama3-8b/",
|
3 |
"architectures": [
|
4 |
"LlamaForCausalLM"
|
5 |
],
|
pytorch_model.bin
CHANGED
@@ -1,3 +1,3 @@
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
-
oid sha256:
|
3 |
size 16060563132
|
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:1d8d275ba30cd97e1bf6280c8ceacf4bf23d647760b7258d55a16642a14d9a8c
|
3 |
size 16060563132
|