Can i run this on tex-gen-ui? I want to stream the token generation

#3
by asach - opened

Please provide some instructions to run this, really appreciate your work and help.

Cognitive Computations org

I was able to run on oobabooga
using 2x 3090

  1. install oobabooga
  2. download TheBloke's 4-bit gptq into 'models' directory
  3. modify the following files
modules/models.py ->
          config = AutoConfig.from_pretrained(path_to_model, trust_remote_code=True)
modules/AutoGPTQ_loader.py ->
     # Define the params for AutoGPTQForCausalLM.from_quantized
    params = {
        ...
        "trust_remote_code": True,
        ...
    }
  1. run ooba python server.py --listen --model_type llama --wbits 4 --groupsize -1 --auto-devices
  2. in models tab, select WizardLM-Uncensored-Falcon-40b
  3. if it doesn't load, choose 4-bit and reload
  4. in instructions tab choose prompt instruct-wizardlm
  5. ask your question. It's slow but it works. The answers are spectacular.

Thanks for the reply! Loading it with 4bit gives this error. Have made the same changes and the config is on runpod

2 X NVIDIA L40
64 vCPU 500 GB RAM

Screenshot 2023-06-07 at 11.49.26 AM.png

Screenshot 2023-06-07 at 11.48.51 AM.png

I got it loaded with your instructions, but a nonsense response to the prompt:


### Response:DayGenVerEvEvEv```

Any advice?

Any plans for an uncensored version of the instruct trained falcon 40b?

Cognitive Computations org

I plan to train Dolphin on Falcon 40b, which I expect will be much better than falcon-40b-instruct.

I plan to train Dolphin on Falcon 40b, which I expect will be much better than falcon-40b-instruct.

What is your estimation about the release date of this model? Will it be 13b?

Best Model i have tried for reasoning questions. Thank you !

Sign up or log in to comment