SFLY5 sunzhongkai588 commited on
Commit
711fbdc
·
verified ·
1 Parent(s): 1e5b28d

Update README.md (#1)

Browse files

- Update README.md (2b3c45757d4c7686d115fda501bddd5383e5e116)


Co-authored-by: Suen.ZK <[email protected]>

Files changed (1) hide show
  1. README.md +3 -54
README.md CHANGED
@@ -32,6 +32,9 @@ library_name: transformers
32
 
33
  # ERNIE-4.5-VL-424B-A47B
34
 
 
 
 
35
  ## ERNIE 4.5 Highlights
36
 
37
  The advanced capabilities of the ERNIE 4.5 models, particularly the MoE-based A47B and A3B series, are underpinned by several key technical innovations:
@@ -61,60 +64,6 @@ ERNIE-4.5-VL-424B-A47B is a multimodal MoE Chat model based on ERNIE-4.5-VL-424B
61
 
62
  ## Quickstart
63
 
64
- ### FastDeploy Inference
65
-
66
- Quickly deploy services using FastDeploy as shown below. For more detailed usage, refer to the [FastDeploy GitHub Repository](https://github.com/PaddlePaddle/FastDeploy).
67
-
68
- **Note**: 80GB x 8 GPU resources are required. The `--quantization` parameter supports specifying `wint4` or `wint8` for deployment with 4-bit or 8-bit quantization, respectively.
69
-
70
- ```bash
71
- python -m fastdeploy.entrypoints.openai.api_server \
72
- --model baidu/ERNIE-4.5-VL-424B-A47B-Paddle \
73
- --port 8180 \
74
- --metrics-port 8181 \
75
- --engine-worker-queue-port 8182 \
76
- --tensor-parallel-size 8 \
77
- --quantization wint4 \
78
- --max-model-len 32768 \
79
- --enable-mm \
80
- --reasoning-parser ernie-45-vl \
81
- --max-num-seqs 32
82
- ```
83
-
84
- The ERNIE-4.5-VL model supports enabling or disabling thinking mode through request parameters.
85
-
86
- #### Enable Thinking Mode
87
-
88
- ```bash
89
- curl -X POST "http://0.0.0.0:8180/v1/chat/completions" \
90
- -H "Content-Type: application/json" \
91
- -d '{
92
- "messages": [
93
- {"role": "user", "content": [
94
- {"type": "image_url", "image_url": {"url": "https://paddlenlp.bj.bcebos.com/datasets/paddlemix/demo_images/example2.jpg"}},
95
- {"type": "text", "text": "Descript this image"}
96
- ]}
97
- ],
98
- "metadata": {"enable_thinking": true}
99
- }'
100
- ```
101
-
102
- #### Disable Thinking Mode
103
-
104
- ```bash
105
- curl -X POST "http://0.0.0.0:8180/v1/chat/completions" \
106
- -H "Content-Type: application/json" \
107
- -d '{
108
- "messages": [
109
- {"role": "user", "content": [
110
- {"type": "image_url", "image_url": {"url": "https://paddlenlp.bj.bcebos.com/datasets/paddlemix/demo_images/example2.jpg"}},
111
- {"type": "text", "text": "Descript this image"}
112
- ]}
113
- ],
114
- "metadata": {"enable_thinking": false}
115
- }'
116
- ```
117
-
118
  ### vLLM inference
119
 
120
  We are working with the community to fully support ERNIE4.5 models, stay tuned.
 
32
 
33
  # ERNIE-4.5-VL-424B-A47B
34
 
35
+ > [!NOTE]
36
+ > Note: "**-Paddle**" models use [PaddlePaddle](https://github.com/PaddlePaddle/Paddle) weights, while "**-PT**" models use Transformer-style PyTorch weights.
37
+
38
  ## ERNIE 4.5 Highlights
39
 
40
  The advanced capabilities of the ERNIE 4.5 models, particularly the MoE-based A47B and A3B series, are underpinned by several key technical innovations:
 
64
 
65
  ## Quickstart
66
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
67
  ### vLLM inference
68
 
69
  We are working with the community to fully support ERNIE4.5 models, stay tuned.