Deploying Hugging Face models with Viam: Use models on any robot in the real world

Community Article Published August 14, 2024

Hugging Face is a vibrant hub for the machine learning community, offering an extensive collection of open-source computer vision and large language models. This ecosystem enables developers to contribute, find, and utilize a diverse range of models and datasets making machine learning accessible to all. Viam provides an open-source platform for configuring, controlling, and deploying custom code on robots, IoT devices, and smart machines out in the world. With Viam’s Registry, the Viam developer community can share custom modular components and services that can be used on physical machines, much like the model and data-sharing ecosystem at Hugging Face.

Computer vision enables robots to perceive and interact with their surroundings. Viam’s Vision Service supports advanced capabilities such as real-time object detection, classification, and 3D segmentation, allowing robots to understand and dynamically respond to the world. Additionally, Viam offers custom model training using images collected with Viam, allowing developers to tailor solutions to specific applications. But with the vast array of datasets and models available with Hugging Face, how can developers leverage these resources to enhance robotic capabilities even further?

To harness the power of Hugging Face models on Viam machines, the Viam community has contributed custom Vision Service modules that integrate YOLOv5 and YOLOv8 inference libraries to the registry for real-time detections. Configuration and testing of these models on hardware requires no code up front. The YOLOv5 module offers ease of use, making this model a popular choice for developers looking to quickly deploy computer vision models. The YOLOv8 module, on the other hand, is designed for speed and accuracy, making it ideal for applications that require real-time object detection.

The choice between the two depends on the specific requirements of the application, whether it’s ease of deployment or the need for high-performance models in dynamic environments. There are other modules on the Viam registry that leverage other Huggingface models like LLMs and beyond, making it easy for developers to leverage AI models from Hugging Face on their machines in the real world.

Deploying a Hugging Face model on a Viam machine

To use this module, first create a machine instance on the Viam app. For this guide, the only hardware necessary will be a computer to run viam-server and a webcam to show detections or classifications from a Hugging Face model of choice.

Set up the machine according to the instructions in the Set up your machine part guide in the Viam app. Once the machine is connected and live, head to the Configure tab to begin configuring a webcam and the vision service that will run the Hugging Face model.

In the Builder panel, add a component using the ‘+’ icon. Select ‘Component’. Search for the camera model ‘webcam’ and add it to the machine configuration.

image/png

Next, add a vision service that will leverage the configured camera to run detections. Using the ‘+’ icon, select ‘Service’ and search for the ‘vision / yolov8’ model to add to the machine configuration. Either YOLO module will work for this tutorial.

image/png

In order to leverage a Hugging Face model with a Viam YOLO module, find a model compatible with the model inference library selected. Browse through the Hugging Face model database and select a compatible model. In the GitHub example, a model for hard hat detection is used, which can be used on a security system designed for construction sites to ensure safety measures are followed.

Being a sneaker lover and having a closet that is overflowing, I want to use a shoe classification model to help me figure out what percentage of shoe brands I have in my sneaker collection to help me downsize (or have an excuse to buy more). I’m selecting YOLOv8 Shoe Classification Model from the Hugging Face models library, uploaded by user @keremberke.

image/png

After the module is added to the machine, configure the custom attributes for the Vision Service. The following attributes are available for the model:

Name Type Inclusion Description
model_location string Required Local path or HuggingFace model identifier

Because I am using a model hosted on Hugging Face, all I need to add is the path after the URL slug.

{ "model_location": "keremberke/yolov8n-shoe-classification" }

If using a locally downloaded model, the configuration syntax is as follows:

{ "model_location": "/path/to/yolov8n.pt" }

image/png

The final configuration step is to configure a transform camera component, which is a pipeline for applying transformations to an input image source. Add a new Component, search for ‘camera / transform’, and configure the pipeline as shown. For this transform camera, the transformations will be layered over the webcam feed to show classifications in real-time. Specify the classifier name, which will be the previously configured vision service named ‘yolov8’. Add the ‘webcam’ in the ‘Depends On’ attribute field. If a detector is chosen, follow the instructions to set up detections for a transform camera.

image/png

Once the pipeline is set up, it is time to test the Hugging Face model on a webcam, showing real-time classifications. This setup works whether you are testing on a computer or have configured a machine that exists out in the world, allowing AI capabilities to enhance the performance of a machine in many applications.

Next steps

By leveraging the YOLO modules in Viam’s Registry, developers can use state-of-the-art object detection algorithms modularly with minimal upfront coding. This flexibility allows for rapid prototyping and deployment, catering to a wide range of applications from security systems to automation in a warehouse to cleaning up your closet for your next home improvement project. After testing Hugging Face models on a Viam machine, the next step is to write custom code using Viam’s unified API’s and flexible SDK’s offered in different languages. Explore the Viam Registry and contribute to the open-source ecosystem by adding more features, models, and integrations for machine learning applications. Whether organizing your closet or creating a new home security system: Computer vision has so many potential applications, go start building your next prototype with Viam today.