--- library_name: transformers tags: [] --- # Nora Nora is an open vision-language-action model trained on robot manipulation episodes from the [Open X-Embodiment](https://robotics-transformer-x.github.io/) dataset. The model takes language instructions and camera images as input and generates robot actions. Nora is trained directly from Qwen 2.5 VL-3B. All Nora checkpoints, as well as our [training codebase](https://github.com/declare-lab/nora) are released under an MIT License. ### Model Description - **Model type:** Vision-language-action (language, image => robot actions) - **Language(s) (NLP):** english - **License:** MIT - **Finetuned from model :** Qwen 2.5 VL-3B ### Model Sources - **Repository:** https://github.com/declare-lab/nora - **Paper :** https://www.arxiv.org/abs/2504.19854 - **Demo:** https://declare-lab.github.io/nora ## Usage Nora take a language instruction and a camera image of a robot workspace as input, and predict (normalized) robot actions consisting of 7-DoF end-effector deltas of the form (x, y, z, roll, pitch, yaw, gripper). To execute on an actual robot platform, actions need to be un-normalized subject to statistics computed on a per-robot, per-dataset basis. ## Getting Started For Inference To get started with loading and running Nora for inference, we provide a lightweight interface that with minimal dependencies. ```bash git clone https://github.com/declare-lab/nora cd inference pip install -r requirements.txt ``` For example, to load Nora for zero-shot instruction following in the BridgeData V2 environments with a WidowX robot: ```python # Load VLA from inference.nora import Nora nora = Nora(device='cuda') # Get Inputs image: Image.Image = camera(...) instruction: str = # Predict Action (7-DoF; un-normalize for BridgeData V2) actions = nora.inference( image=image, # Dummy image instruction=instruction, unnorm_key='bridge_orig' # Optional, specify if needed ) # Execute... robot.act(action, ...) ```