Mistral on AWS Inf2 with FastAPI

Use FastAPI to quickly host serving of Mistral model on AWS Inferentia2 instance Inf2 🚀 Support Multimodal input type (input_embeds) 🖼️

Environment Setup

Follow the instructions in Neuron docs Pytorch Neuron Setup for basic environment setup.

Go to the virtual env and install the extra packages.

cd app
pip install -r requirements.txt

uvicorn main:app --host 0.0.0.0 --port 8000

Test via the input_ids (normal prompt) version:

cd client
python client.py

Test via the input_embeds (common multimodal input, skip embedding layer) version:

cd client
python embeds_client.py

You could build container image using the Dockerfile, or using the pre-build image:

docker run --rm --name mistral -d -p 8000:8000 --device=/dev/neuron0 public.ecr.aws/shtian/fastapi-mistral