Edit model card
YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

Mistral on AWS Inf2 with FastAPI

Use FastAPI to quickly host serving of Mistral model on AWS Inferentia2 instance Inf2 🚀 Support Multimodal input type (input_embeds) 🖼️

image

Environment Setup

Follow the instructions in Neuron docs Pytorch Neuron Setup for basic environment setup.

Install Packages

Go to the virtual env and install the extra packages.

cd app
pip install -r requirements.txt

Run the App

uvicorn main:app --host 0.0.0.0 --port 8000

Send the Request

Test via the input_ids (normal prompt) version:

cd client
python client.py

Test via the input_embeds (common multimodal input, skip embedding layer) version:

cd client
python embeds_client.py

Container

You could build container image using the Dockerfile, or using the pre-build image:

docker run --rm --name mistral -d -p 8000:8000 --device=/dev/neuron0 public.ecr.aws/shtian/fastapi-mistral
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference API
Unable to determine this model's library. Check the docs .