Minor fix on the example offline code:
The original was missing the model_name variable.
Additionally extended the comment with option tensor_parallel_size and pipeline_parallel_size option to load the model on multiple GPU if it does not fit to a single GPU VRAM.

csabakecskemeti changed pull request status to closed
Your need to confirm your account before you can post a new comment.

Sign up or log in to comment