Update README.md
Browse files
README.md
CHANGED
@@ -20,10 +20,8 @@ Nature Language Model (NatureLM) is a sequence-based science foundation model de
|
|
20 |
|
21 |
# Model sources
|
22 |
## Repository:
|
23 |
-
We provide
|
24 |
|
25 |
-
- https://huggingface.co/microsoft/NatureLM-1B
|
26 |
-
- https://huggingface.co/microsoft/NatureLM-1B-Inst
|
27 |
- https://huggingface.co/microsoft/NatureLM-8x7B
|
28 |
- https://huggingface.co/microsoft/NatureLM-8x7B-Inst
|
29 |
|
@@ -51,7 +49,6 @@ The use of NatureLM must align with ethical research practices. It is not intend
|
|
51 |
|
52 |
|
53 |
|
54 |
-
|
55 |
## Risks and limitations
|
56 |
NatureLM may not always generate compounds or proteins precisely aligned with user instructions. Users are advised to apply their own adaptive filters before proceeding. Users are responsible for verification of model outputs and decision-making.
|
57 |
NatureLM was designed and tested using the English language. Performance in other languages may vary and should be assessed by someone who is both an expert in the expected outputs and a native speaker of that language.
|
@@ -68,17 +65,10 @@ Preprocessing
|
|
68 |
The training procedure involves two stages: Stage 1 focuses on training newly introduced tokens while freezing existing model parameters. Stage 2 involves joint optimization of both new and existing parameters to enhance overall performance.
|
69 |
|
70 |
## Training hyperparameters
|
71 |
-
- Learning Rate:
|
72 |
-
|
73 |
-
|
74 |
-
-
|
75 |
-
- 1B model: 4096
|
76 |
-
- 8x7B model: 1536
|
77 |
-
- Context Length (Tokens):
|
78 |
-
- All models: 8192
|
79 |
-
- GPU Number (H100):
|
80 |
-
- 1B model: 64
|
81 |
-
- 8x7B model: 256
|
82 |
|
83 |
## Speeds, sizes, times
|
84 |
|
|
|
20 |
|
21 |
# Model sources
|
22 |
## Repository:
|
23 |
+
We provide two repositories for 8x7B models, including both base versions and instruction-finetuned versions.
|
24 |
|
|
|
|
|
25 |
- https://huggingface.co/microsoft/NatureLM-8x7B
|
26 |
- https://huggingface.co/microsoft/NatureLM-8x7B-Inst
|
27 |
|
|
|
49 |
|
50 |
|
51 |
|
|
|
52 |
## Risks and limitations
|
53 |
NatureLM may not always generate compounds or proteins precisely aligned with user instructions. Users are advised to apply their own adaptive filters before proceeding. Users are responsible for verification of model outputs and decision-making.
|
54 |
NatureLM was designed and tested using the English language. Performance in other languages may vary and should be assessed by someone who is both an expert in the expected outputs and a native speaker of that language.
|
|
|
65 |
The training procedure involves two stages: Stage 1 focuses on training newly introduced tokens while freezing existing model parameters. Stage 2 involves joint optimization of both new and existing parameters to enhance overall performance.
|
66 |
|
67 |
## Training hyperparameters
|
68 |
+
- Learning Rate: 2×10<sup>−4</sup>
|
69 |
+
- Batch Size (Sentences): 8x7B model: 1536
|
70 |
+
- Context Length (Tokens): 8192
|
71 |
+
- GPU Number (H100): 8x7B model: 256
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
72 |
|
73 |
## Speeds, sizes, times
|
74 |
|