add to docs (#703)
Browse files- README.md +2 -0
- docs/faq.md +14 -0
README.md
CHANGED
|
@@ -901,6 +901,8 @@ CUDA_VISIBLE_DEVICES="" python3 -m axolotl.cli.merge_lora ...
|
|
| 901 |
|
| 902 |
## Common Errors 🧰
|
| 903 |
|
|
|
|
|
|
|
| 904 |
> If you encounter a 'Cuda out of memory' error, it means your GPU ran out of memory during the training process. Here's how to resolve it:
|
| 905 |
|
| 906 |
Please reduce any below
|
|
|
|
| 901 |
|
| 902 |
## Common Errors 🧰
|
| 903 |
|
| 904 |
+
See also the [FAQ's](./docs/faq.md).
|
| 905 |
+
|
| 906 |
> If you encounter a 'Cuda out of memory' error, it means your GPU ran out of memory during the training process. Here's how to resolve it:
|
| 907 |
|
| 908 |
Please reduce any below
|
docs/faq.md
ADDED
|
@@ -0,0 +1,14 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Axolotl FAQ's
|
| 2 |
+
|
| 3 |
+
|
| 4 |
+
> The trainer stopped and hasn't progressed in several minutes.
|
| 5 |
+
|
| 6 |
+
Usually an issue with the GPU's communicating with each other. See the [NCCL doc](../docs/nccl.md)
|
| 7 |
+
|
| 8 |
+
> Exitcode -9
|
| 9 |
+
|
| 10 |
+
This usually happens when you run out of system RAM.
|
| 11 |
+
|
| 12 |
+
> Exitcode -7 while using deepspeed
|
| 13 |
+
|
| 14 |
+
Try upgrading deepspeed w: `pip install -U deepspeed`
|