Spaces:

Eralien
/

VoxPoserExamples

Sleeping

App Files Files Community

VoxPoserExamples / README.md

Eralien

Fix: gradio spaces config

3960742 over 1 year ago

preview code

raw

history blame contribute delete

2 kB

	---
	title: VoxPoserExamples
	emoji: 🔥
	colorFrom: pink
	colorTo: green
	sdk: gradio
	sdk_version: 3.40.1
	app_file: app.py
	pinned: false
	---

	# VoxPoser API Examples

	## Usage

	```bash
	python3 app.py
	```

	1. 在界面中填写OpenAI API Key，使用的代理地址，选择需要的configuration
	2. 点击Setup/Reset Simulation
	3. 填写自定义Instruction
	4. 点击Run执行（需要等待较长时间）

	## Example

	### VLM & Perception

	1. Open Vocab object detection [owlvit](https://huggingface.co/docs/transformers/model_doc/owlvit)
	2. [SAM](https://github.com/facebookresearch/segment-anything)
	3. Object mask tracking [XMem](https://github.com/hkchengrex/XMem)
	4. 使用realsense获得深度图
	5. 使用深度图获得法向量（抓取位姿）

	可替代性：
	- [x] owlvit -> Grounded SAM / YOLO
	- [x] SAM -> FastSAM / YOLO-seg
	- [ ] XMem -> DeepSORT(?) ByteTrack(?)

	### LMP语言模型编程

	语言模型编程：使用GPT-4

	VoxPoser需要三大类LMP:
	1. Planner
	2. Composer
	3. Value map generator

	可替代性：
	- [ ] GPT-4 -> LLaMA2 (?)

	## LMPs

	### Planner

	LMP的输出是一系列的编程模型接口，Planner将这些语言描述转化为一系列高层级的规划，每步规划这些动作将被Composer执行。

	模拟环境中不使用规划器，因为评估的任务由单个操作阶段组成。

	### Composer

	Composer LMP 从依次逐渐调用如下模组：
	1. 感知模组调用获得感知结果
	2. [optional] Affordance LMP
	3. [optional] Avoidance LMP
	4. [optional] End Effector Velocity LMP
	5. [optional] End Effector Rotation LMP
	6. [optional] Gripper Action LMP
	7. Execute

	### Value Maps

	TODO

	### Execution

	1. Motion Planner: 贪心搜索得到一系列末端位姿，仅适用Affordance Map 和 Avoidance Map
	2. Cost map: $W = -2 * \text{norm}(\text{Affordance}) - \text{norm}(\text{Avoidance})$
	3. 根据离开/接近，调用目标法向量的正/负值方向上的Affordance Map
	4. 根据避障目标的占据栅格occupancy_map，调整Avoidance Map