jackyliang42 commited on
Commit
fc126c5
·
1 Parent(s): c81e0c0

updated readme

Browse files
Files changed (2) hide show
  1. README.md +12 -19
  2. app.py +2 -1
README.md CHANGED
@@ -11,37 +11,31 @@ license: apache-2.0
11
  ---
12
 
13
  # Code as Policies Tabletop Manipulation Interactive Demo
14
-
15
  This notebook is a part of the open-source code release associated with the paper:
16
-
17
  [Code as Policies: Language Model Programs for Embodied Control](https://code-as-policies.github.io/)
18
-
19
  This notebook gives an interactive demo for the simulated tabletop manipulation domain, seen in the paper section IV.D
20
 
21
- ## Preparations:
22
-
23
- 1) Obtain an [OpenAI API Key](https://openai.com/blog/openai-api/)
24
-
25
- 2) Gain Codex access by [joining the waitlist](https://openai.com/blog/openai-codex/)
26
-
27
  Once you have Codex access you can use `code-davinci-002`. Using the GPT-3 model (`text-dainvci-002`) is also ok, but performance won't be as good (there will be more code logic errors).
28
 
29
- ## Instructions:
30
-
31
  1. Fill in the API Key, model name, and how many blocks and bowls to be spawned in the environment.
32
  2. Click Setup/Reset Env
33
  3. Based on the new randomly sampled object names, input an instruction and click Run Instruction. If successful, this will render a video and update the simulation environment visualization.
34
 
35
- You can run instructions in sequence and refer back to previous commands (e.g. do the same with other blocks, move the same block to the other bowl, etc). Click Setup/Reset Env to reset, and this will clear the current instruction history.
36
 
37
- Supported commands:
38
  * Spatial reasoning (e.g. to the left of the red block, the closest corner, the farthest bowl, the second block from the right)
39
  * Sequential actions (e.g. put blocks in matching bowls, stack blocks on the bottom right corner)
40
- * Contextual commands (e.g. do the same with the blue block, undo that)
41
  * Language-based reasoning (e.g. put the forest-colored block on the ocean-colored bowl).
42
  * Simple Q&A (e.g. how many blocks are to the left of the blue bowl?)
43
 
44
- Example commands (note object names may need to be changed depending the sampled object names):
 
45
  * put the sun-colored block on the bowl closest to it
46
  * stack the blocks on the bottom most bowl
47
  * arrange the blocks as a square in the middle
@@ -49,9 +43,8 @@ Example commands (note object names may need to be changed depending the sampled
49
  * how many blocks are to the right of the orange bowl?
50
  * pick up the block closest to the top left corner and place it on the bottom right corner
51
 
52
- Known limitations:
53
- * In simulation we're using ground truth object poses instead of using vision models. This means that commands the require knowledge of visual apperances (e.g. darkest bowl, largest object) are not supported.
54
  * Currently, the low-level pick place primitive does not do collision checking, so if there are many objects on the table, placing actions may incur collisions.
55
- * Prompt saturation - if too many commands (10+) are executed in a row, then the LLM may start to ignore examples in the early parts of the prompt.
56
  * Ambiguous instructions - if a given instruction doesn't lead to the desired actions, try rephrasing it to remove ambiguities (e.g. place the block on the closest bowl -> place the block on its closest bowl)
57
-
 
11
  ---
12
 
13
  # Code as Policies Tabletop Manipulation Interactive Demo
 
14
  This notebook is a part of the open-source code release associated with the paper:
 
15
  [Code as Policies: Language Model Programs for Embodied Control](https://code-as-policies.github.io/)
 
16
  This notebook gives an interactive demo for the simulated tabletop manipulation domain, seen in the paper section IV.D
17
 
18
+ ## Preparations
19
+ 1. Obtain an [OpenAI API Key](https://openai.com/blog/openai-api/)
20
+ 2. Gain Codex access by [joining the waitlist](https://openai.com/blog/openai-codex/)
 
 
 
21
  Once you have Codex access you can use `code-davinci-002`. Using the GPT-3 model (`text-dainvci-002`) is also ok, but performance won't be as good (there will be more code logic errors).
22
 
23
+ ## Usage
 
24
  1. Fill in the API Key, model name, and how many blocks and bowls to be spawned in the environment.
25
  2. Click Setup/Reset Env
26
  3. Based on the new randomly sampled object names, input an instruction and click Run Instruction. If successful, this will render a video and update the simulation environment visualization.
27
 
28
+ You can run instructions in sequence and refer back to previous instructions (e.g. do the same with other blocks, move the same block to the other bowl, etc). Click Setup/Reset Env to reset, and this will clear the current instruction history.
29
 
30
+ ## Supported Instructions
31
  * Spatial reasoning (e.g. to the left of the red block, the closest corner, the farthest bowl, the second block from the right)
32
  * Sequential actions (e.g. put blocks in matching bowls, stack blocks on the bottom right corner)
33
+ * Contextual instructions (e.g. do the same with the blue block, undo that)
34
  * Language-based reasoning (e.g. put the forest-colored block on the ocean-colored bowl).
35
  * Simple Q&A (e.g. how many blocks are to the left of the blue bowl?)
36
 
37
+ ## Example Instructions
38
+ Note object names may need to be changed depending the sampled object names.
39
  * put the sun-colored block on the bowl closest to it
40
  * stack the blocks on the bottom most bowl
41
  * arrange the blocks as a square in the middle
 
43
  * how many blocks are to the right of the orange bowl?
44
  * pick up the block closest to the top left corner and place it on the bottom right corner
45
 
46
+ ## Known Limitations
47
+ * In simulation we're using ground truth object poses instead of using vision models. This means that instructions the require knowledge of visual apperances (e.g. darkest bowl, largest object) are not supported.
48
  * Currently, the low-level pick place primitive does not do collision checking, so if there are many objects on the table, placing actions may incur collisions.
49
+ * Prompt saturation - if too many instructions (10+) are executed in a row, then the LLM may start to ignore examples in the early parts of the prompt.
50
  * Ambiguous instructions - if a given instruction doesn't lead to the desired actions, try rephrasing it to remove ambiguities (e.g. place the block on the closest bowl -> place the block on its closest bowl)
 
app.py CHANGED
@@ -80,7 +80,7 @@ class DemoRunner:
80
 
81
  self._lmp_tabletop_ui = self.make_LMP(self._env)
82
 
83
- info = '## Available Objects: \n- ' + '\n- '.join(obj_list)
84
  img = self._env.get_camera_image()
85
 
86
  return info, img
@@ -118,6 +118,7 @@ if __name__ == '__main__':
118
 
119
  with demo:
120
  gr.Markdown(readme_text)
 
121
  with gr.Row():
122
  with gr.Column():
123
  with gr.Row():
 
80
 
81
  self._lmp_tabletop_ui = self.make_LMP(self._env)
82
 
83
+ info = '### Available Objects: \n- ' + '\n- '.join(obj_list)
84
  img = self._env.get_camera_image()
85
 
86
  return info, img
 
118
 
119
  with demo:
120
  gr.Markdown(readme_text)
121
+ gr.Markdown('# Interactive Demo')
122
  with gr.Row():
123
  with gr.Column():
124
  with gr.Row():