qqc1989

Update README.md

e4ac2c3 verified about 2 months ago

7.65 kB

	---
	library_name: transformers
	license: apache-2.0
	base_model:
	- HuggingFaceTB/SmolLM2-360M-Instruct
	tags:
	- HuggingFaceTB
	- SmolLM2
	- SmolLM2-360M-Instruct
	- Int8
	- M5Stack
	- RaspberryPi 5
	language:
	- en
	---

	# SmolLM2-360M-Instruct

	![image/png](https://cdn-uploads.huggingface.co/production/uploads/61c141342aac764ce1654e43/oWWfzW4RbWkVIo7f-5444.png)

	This version of SmolLM2-360M-Instruct has been converted to run on the Axera NPU using w8a16 quantization.

	This model has been optimized with the following LoRA:

	Compatible with Pulsar2 version: 3.4(Not released yet)

	## Convert tools links:

	For those who are interested in model conversion, you can try to export axmodel through the original repo
	https://huggingface.co/HuggingFaceTB/SmolLM2-360M-Instruct

	[Pulsar2 Link, How to Convert LLM from Huggingface to axmodel](https://pulsar2-docs.readthedocs.io/en/latest/appendix/build_llm.html)

	[AXera NPU HOST LLM Runtime](https://github.com/AXERA-TECH/ax-llm/tree/internvl2)

	[AXera NPU AXCL LLM Runtime](https://github.com/AXERA-TECH/ax-llm/tree/axcl-llm-internvl)

	## Support Platform

	- AX650
	- AX650N DEMO Board
	- [M4N-Dock(爱芯派Pro)](https://wiki.sipeed.com/hardware/zh/maixIV/m4ndock/m4ndock.html)
	- [M.2 Accelerator card](https://axcl-docs.readthedocs.io/zh-cn/latest/doc_guide_hardware.html)
	- AX630C
	- [爱芯派2](https://axera-pi-2-docs-cn.readthedocs.io/zh-cn/latest/index.html)
	- [Module-LLM](https://docs.m5stack.com/zh_CN/module/Module-LLM)
	- [LLM630 Compute Kit](https://docs.m5stack.com/zh_CN/core/LLM630%20Compute%20Kit)

	\|Chips\|w8a16\|w4a16\|
	\|--\|--\|--\|
	\|AX650\| 39 tokens/sec\|todo\|
	\|AX630C\| 14 tokens/sec\|todo\|

	## How to use

	Download all files from this repository to the device

	```
	root@ax650:/mnt/qtang/llm-test/smollm2-360m# tree -L 1
	.
	\|-- main_axcl_aarch64
	\|-- main_axcl_x86
	\|-- main_prefill
	\|-- post_config.json
	\|-- run_smollm2_360m_ax630c.sh
	\|-- run_smollm2_360m_ax650.sh
	\|-- run_smollm2_360m_axcl_aarch64.sh
	\|-- run_smollm2_360m_axcl_x86.sh
	\|-- smollm2-360m-ax630c
	\|-- smollm2-360m-ax650
	\|-- smollm2_tokenizer
	`-- smollm2_tokenizer.py
	```

	### Install transformer

	```
	pip install transformers==4.41.1
	```

	### Start the Tokenizer service

	```
	root@ax650:/mnt/qtang/llm-test/smollm2-360m$ python smollm2_tokenizer.py --port 12345
	1 <\|im_start\|> 2 <\|im_end\|>
	<\|im_start\|>system
	You are a helpful AI assistant named SmolLM, trained by Hugging Face<\|im_end\|>
	<\|im_start\|>user
	hello world<\|im_end\|>
	<\|im_start\|>assistant

	[1, 9690, 198, 2683, 359, 253, 5356, 5646, 11173, 3365, 3511, 308, 34519, 28, 7018, 411, 407, 19712, 8182, 2, 198, 1, 4093, 198, 28120, 905, 2, 198, 1, 520, 9531, 198]
	http://localhost:12345
	```

	### Inference with AX650 Host, such as M4N-Dock(爱芯派Pro) or AX650N DEMO Board

	Open another terminal and run `run_smollm2_360m_ax650.sh`

	```
	root@ax650:/mnt/qtang/llm-test/smollm2-360m# ./run_smollm2_360m_ax650.sh
	[I][ Init][ 125]: LLM init start
	bos_id: 1, eos_id: 2
	2% \| █ \| 1 / 35 [0.00s<0.14s, 250.00 count/s] tokenizer init ok
	[I][ Init][ 26]: LLaMaEmbedSelector use mmap
	100% \| ████████████████████████████████ \| 35 / 35 [0.81s<0.81s, 43.37 count/s] init post axmodel ok,remain_cmm(3339 MB)
	[I][ Init][ 241]: max_token_len : 1023
	[I][ Init][ 246]: kv_cache_size : 320, kv_cache_num: 1023
	[I][ Init][ 254]: prefill_token_num : 128
	[I][ load_config][ 281]: load config:
	{
	"enable_repetition_penalty": false,
	"enable_temperature": true,
	"enable_top_k_sampling": true,
	"enable_top_p_sampling": false,
	"penalty_window": 20,
	"repetition_penalty": 1.2,
	"temperature": 0.9,
	"top_k": 10,
	"top_p": 0.8
	}

	[I][ Init][ 268]: LLM init ok
	Type "q" to exit, Ctrl+c to stop current running
	>> who are you?
	[I][ Run][ 466]: ttft: 156.63 ms
	I'm a chatbot developed by the Artificial Intelligence Research and Development Lab (AI R&D Lab) at Hugging Face Labs,
	specifically designed to facilitate and augment human-AI conversations. My role is to provide assistance in understanding
	and responding to natural language queries, using advanced language models and AI algorithms to understand context and intent.

	[N][ Run][ 605]: hit eos,avg 38.70 token/s

	>> q

	```

	### Inference with M.2 Accelerator card

	[What is M.2 Accelerator card?](https://axcl-docs.readthedocs.io/zh-cn/latest/doc_guide_hardware.html), Show this DEMO based on Raspberry PI 5.

	```
	(base) axera@raspberrypi:~/samples/smollm2-360m $ ./run_smollm2_360m_axcl_aarch64.sh
	build time: Feb 13 2025 15:44:57
	[I][ Init][ 111]: LLM init start
	bos_id: 1, eos_id: 2
	100% \| ████████████████████████████████ \| 35 / 35 [18.07s<18.07s, 1.94 count/s] init post axmodel okremain_cmm(6621 MB)
	[I][ Init][ 226]: max_token_len : 1023
	[I][ Init][ 231]: kv_cache_size : 320, kv_cache_num: 1023
	[I][ load_config][ 282]: load config:
	{
	"enable_repetition_penalty": false,
	"enable_temperature": true,
	"enable_top_k_sampling": true,
	"enable_top_p_sampling": false,
	"penalty_window": 20,
	"repetition_penalty": 1.2,
	"temperature": 0.9,
	"top_k": 10,
	"top_p": 0.8
	}

	[I][ Init][ 288]: LLM init ok
	Type "q" to exit, Ctrl+c to stop current running
	>> who are you?

	I'm a virtual AI assistant, designed to support users with their questions and tasks.
	I was trained on a vast dataset of text, including text from various sources and
	conversations. This extensive training allows me to understand and respond to a wide range of queries.
	I'm here to be helpful and provide answers to your questions.

	[N][ Run][ 610]: hit eos,avg 20.81 token/s


	>> ^Cq

	(base) axera@raspberrypi:~ $ axcl-smi
	+------------------------------------------------------------------------------------------------+
	\| AXCL-SMI V2.26.0_20250205130139 Driver V2.26.0_20250205130139 \|
	+-----------------------------------------+--------------+---------------------------------------+
	\| Card Name Firmware \| Bus-Id \| Memory-Usage \|
	\| Fan Temp Pwr:Usage/Cap \| CPU NPU \| CMM-Usage \|
	\|=========================================+==============+=======================================\|
	\| 0 AX650N V2.26.0 \| 0000:01:00.0 \| 171 MiB / 945 MiB \|
	\| -- 39C -- / -- \| 2% 0% \| 468 MiB / 7040 MiB \|
	+-----------------------------------------+--------------+---------------------------------------+

	+------------------------------------------------------------------------------------------------+
	\| Processes: \|
	\| Card PID Process Name NPU Memory Usage \|
	\|================================================================================================\|
	\| 0 18636 /home/axera/qtang/llm-test/smollm2-360m/main_axcl_aarch64 418580 KiB \|
	+------------------------------------------------------------------------------------------------+
	(base) axera@raspberrypi:~ $
	```