run Phi-2 on your CPU
how about inference speed?
It is faster then larger models, just as expected.
Hi J22,
Thank you for your work.
I visited ChatLLM.cpp to try it out. To generate the quantized models in chatLLM.cpp I did the following:
python3 convert.py -i ~/.cache/huggingface/hub/models--microsoft--phi-2 -t q8_0 -o quantized.bin
But it didn't work.
I got this:
Traceback (most recent call last):
File "convert.py", line 345, in
class TikTokenizerVocab:
File "convert.py", line 354, in TikTokenizerVocab
def bpe(mergeable_ranks: dict[bytes, int], token: bytes, max_rank: Optional[int] = None) -> list[bytes]:
TypeError: 'type' object is not subscriptable
Can you please help?
Thank you!
G
Thank you for your reply but I've tried to do what you suggested without success. Could you please tell me specifically which phi-2 file or files from https://huggingface.co/microsoft/phi-2/tree/main should I give to the script convert.py?
Were you able to convert (quantize) the model in https://huggingface.co/microsoft/phi-2/tree/main? How did you do?
Thank you in advance!
- download all files from here ( *.md files are not needed).
- Let's say the files are located in path /path/to/phi2/files. Run convert.py like this:
python convert.py -i /path/to/phi2/files -o phi2.bin
i have an error with gelu_new
ubuntu@ip-172-31-7-92 ~/t/chatllm.cpp (master)> ls -lhtr phi-2/
total 5.2G
-rw-rw-r-- 1 ubuntu ubuntu 74 Jan 11 22:13 generation_config.json
-rw-rw-r-- 1 ubuntu ubuntu 9.1K Jan 11 22:13 configuration_phi.py
-rw-rw-r-- 1 ubuntu ubuntu 866 Jan 11 22:13 config.json
-rw-rw-r-- 1 ubuntu ubuntu 1.1K Jan 11 22:13 added_tokens.json
-rw-rw-r-- 1 ubuntu ubuntu 2.6K Jan 11 22:13 SECURITY.md
-rw-rw-r-- 1 ubuntu ubuntu 7.3K Jan 11 22:13 README.md
-rw-rw-r-- 1 ubuntu ubuntu 1.8K Jan 11 22:13 NOTICE.md
-rw-rw-r-- 1 ubuntu ubuntu 1.1K Jan 11 22:13 LICENSE
-rw-rw-r-- 1 ubuntu ubuntu 444 Jan 11 22:13 CODE_OF_CONDUCT.md
-rw-rw-r-- 1 ubuntu ubuntu 446K Jan 11 22:13 merges.txt
-rw-rw-r-- 1 ubuntu ubuntu 99 Jan 11 22:13 special_tokens_map.json
-rw-rw-r-- 1 ubuntu ubuntu 62K Jan 11 22:13 modeling_phi.py
-rw-rw-r-- 1 ubuntu ubuntu 35K Jan 11 22:13 model.safetensors.index.json
-rw-rw-r-- 1 ubuntu ubuntu 7.2K Jan 11 22:13 tokenizer_config.json
-rw-rw-r-- 1 ubuntu ubuntu 2.1M Jan 11 22:13 tokenizer.json
-rw-rw-r-- 1 ubuntu ubuntu 780K Jan 11 22:13 vocab.json
-rw-rw-r-- 1 ubuntu ubuntu 538M Jan 11 22:13 model-00002-of-00002.safetensors
-rw-rw-r-- 1 ubuntu ubuntu 4.7G Jan 11 22:14 model-00001-of-00002.safetensors
ubuntu@ip-172-31-7-92 ~/t/chatllm.cpp (master)> python3 convert.py -i phi-2 -o phi2.bin
Loading vocab file phi-2
vocab_size 50295
Traceback (most recent call last):
File "/home/ubuntu/tmp/chatllm.cpp/convert.py", line 1516, in <module>
main()
File "/home/ubuntu/tmp/chatllm.cpp/convert.py", line 1422, in main
Phi2Converter.convert(config, model_files, vocab, ggml_type, args.save_path)
File "/home/ubuntu/tmp/chatllm.cpp/convert.py", line 459, in convert
cls.dump_config(f, config, ggml_type)
File "/home/ubuntu/tmp/chatllm.cpp/convert.py", line 1161, in dump_config
assert config.activation_function == 'gelu_new', "activation_function must be gelu_new"
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AssertionError: activation_function must be gelu_new
Oh, they made so many updates.
https://huggingface.co/microsoft/phi-2/commit/cb2f4533604d8b67de604e7df03bfe6f3ca22869
I will update ChatLLM.cpp accordingly (hopefully next week). Or, you can download an elder revision.
@kirilligum ChatLLM.cpp now supports the latest review of Phi-2. You can pull the latest code of ChatLLM.cpp and try to convert it again.
Thanks for the reference :)
Does the repo support loading a LoRA head I trained?
@talbaumel Sorry, it does not support LoRA at present.