Spaces:
Runtime error
Runtime error
title: Code As Policies | |
emoji: π | |
colorFrom: purple | |
colorTo: indigo | |
sdk: gradio | |
sdk_version: 3.3.1 | |
app_file: app.py | |
pinned: false | |
license: apache-2.0 | |
# Code as Policies Tabletop Manipulation Interactive Demo | |
This notebook is a part of the open-source code release associated with the paper: | |
[Code as Policies: Language Model Programs for Embodied Control](https://code-as-policies.github.io/) | |
This notebook gives an interactive demo for the simulated tabletop manipulation domain, seen in the paper section IV.D | |
## Preparations: | |
1) Obtain an [OpenAI API Key](https://openai.com/blog/openai-api/) | |
2) Gain Codex access by [joining the waitlist](https://openai.com/blog/openai-codex/) | |
Once you have Codex access you can use `code-davinci-002`. Using the GPT-3 model (`text-dainvci-002`) is also ok, but performance won't be as good (there will be more code logic errors). | |
## Instructions: | |
1. Fill in the API Key, model name, and how many blocks and bowls to be spawned in the environment. | |
2. Click Setup/Reset Env | |
3. Based on the new randomly sampled object names, input an instruction and click Run Instruction. If successful, this will render a video and update the simulation environment visualization. | |
You can run instructions in sequence and refer back to previous commands (e.g. do the same with other blocks, move the same block to the other bowl, etc). Click Setup/Reset Env to reset, and this will clear the current instruction history. | |
Supported commands: | |
* Spatial reasoning (e.g. to the left of the red block, the closest corner, the farthest bowl, the second block from the right) | |
* Sequential actions (e.g. put blocks in matching bowls, stack blocks on the bottom right corner) | |
* Contextual commands (e.g. do the same with the blue block, undo that) | |
* Language-based reasoning (e.g. put the forest-colored block on the ocean-colored bowl). | |
* Simple Q&A (e.g. how many blocks are to the left of the blue bowl?) | |
Example commands (note object names may need to be changed depending the sampled object names): | |
* put the sun-colored block on the bowl closest to it | |
* stack the blocks on the bottom most bowl | |
* arrange the blocks as a square in the middle | |
* move the square 5cm to the right | |
* how many blocks are to the right of the orange bowl? | |
* pick up the block closest to the top left corner and place it on the bottom right corner | |
Known limitations: | |
* In simulation we're using ground truth object poses instead of using vision models. This means that commands the require knowledge of visual apperances (e.g. darkest bowl, largest object) are not supported. | |
* Currently, the low-level pick place primitive does not do collision checking, so if there are many objects on the table, placing actions may incur collisions. | |
* Prompt saturation - if too many commands (10+) are executed in a row, then the LLM may start to ignore examples in the early parts of the prompt. | |
* Ambiguous instructions - if a given instruction doesn't lead to the desired actions, try rephrasing it to remove ambiguities (e.g. place the block on the closest bowl -> place the block on its closest bowl) | |