VoxelPathFinder / README.md

Update README.md

9417e11 verified about 1 month ago

4.62 kB

	---
	license: mit
	library_name: pytorch
	pipeline_tag: other
	tags:
	- path-planning
	- 3d
	- voxels
	- cnn
	- transformer
	- robotics
	- pytorch
	- inference
	- Blender
	---

	### Voxel Path Finder (3D Voxel Path Planning with CNN+Transformer)

	This repository hosts the weights and code for a neural network that plans paths in a 3D voxel grid (32×32×32). The model encodes the voxelized environment (obstacles + start + goal) with a 3D CNN, fuses learned position embeddings, and autoregressively generates a sequence of movement actions with a Transformer decoder.

	- Task: 3D voxel path planning (generate action steps from start to goal)
	- Actions: 0..5 → [FORWARD, BACK, LEFT, RIGHT, UP, DOWN]
	- Framework: PyTorch
	- License: MIT
	Github = https://github.com/c1tr0n75/VoxelPathFinder

	### Model architecture (high level)
	- Voxel encoder: 3D CNN with 3 conv blocks → 512-d environment feature
	- Position encoder: learned embeddings over (x, y, z) → 64-d position feature
	- Planner: Transformer decoder over action tokens with START/END special tokens
	- Output: action token sequence; special tokens are excluded from final path

	### Inputs and outputs
	- Input tensors
	- `voxel_data`: float tensor of shape `[1, 3, 32, 32, 32]`
	Channels: [obstacles, start_mask, goal_mask]
	- `positions`: long tensor of shape `[1, 2, 3]`
	Format: `[[start_xyz, goal_xyz]]` with each coordinate in `[0, 31]`
	- Output
	- Long tensor `[1, T]` of action IDs (0..5), padded internally with END if needed

	### Quickstart (inference)
	Make sure this repo includes both `final_model.pth` (or `model_state_dict`) and `pathfinding_nn.py`.

	```python
	import torch, numpy as np
	from huggingface_hub import hf_hub_download
	import importlib.util, sys

	REPO_ID = "c1tr0n75/VoxelPathFinder"
	# Download files from the Hub
	pth_path = hf_hub_download(repo_id=REPO_ID, filename="final_model.pth")
	py_path = hf_hub_download(repo_id=REPO_ID, filename="pathfinding_nn.py")

	# Dynamically import the model code
	spec = importlib.util.spec_from_file_location("pathfinding_nn", py_path)
	mod = importlib.util.module_from_spec(spec)
	spec.loader.exec_module(mod)
	PathfindingNetwork = mod.PathfindingNetwork
	create_voxel_input = mod.create_voxel_input

	device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
	model = PathfindingNetwork().to(device).eval()

	# Load weights (supports either a plain state_dict or {'model_state_dict': ...})
	ckpt = torch.load(pth_path, map_location=device)
	state = ckpt["model_state_dict"] if isinstance(ckpt, dict) and "model_state_dict" in ckpt else ckpt
	model.load_state_dict(state)

	# Build a random test environment
	voxel_dim = model.voxel_dim # (32, 32, 32)
	D, H, W = voxel_dim
	obstacle_prob = 0.2
	obstacles = (np.random.rand(D, H, W) < obstacle_prob).astype(np.float32)
	free = np.argwhere(obstacles == 0)
	assert len(free) >= 2, "Not enough free cells; lower obstacle_prob"
	s_idx, g_idx = np.random.choice(len(free), size=2, replace=False)
	start = tuple(free[s_idx])
	goal = tuple(free[g_idx])

	voxel_np = create_voxel_input(obstacles, start, goal, voxel_dim=voxel_dim) # (3,32,32,32)
	voxel = torch.from_numpy(voxel_np).float().unsqueeze(0).to(device) # (1,3,32,32,32)
	pos = torch.tensor([[start, goal]], dtype=torch.long, device=device) # (1,2,3)

	with torch.no_grad():
	actions = model(voxel, pos)[0].tolist()

	ACTION_NAMES = ['FORWARD', 'BACK', 'LEFT', 'RIGHT', 'UP', 'DOWN']
	decoded = [ACTION_NAMES[a] for a in actions if 0 <= a < 6]
	print(f"Start: {start} \| Goal: {goal}")
	print(f"Generated {len(decoded)} steps (first 30): {decoded[:30]}")
	```

	### Intended uses and limitations
	- Intended: Research and demo of 3D voxel path planning; educational examples; quick inference in CPU/GPU environments.
	- Not intended: Safety-critical navigation without additional validation; large scenes beyond 32³ without retraining; Blender-based generation on hosted environments.
	- The generated actions may not yield collision-free paths in complex scenes; downstream validation is recommended.

	### Training data and procedure
	- Synthetic voxel environments were generated (in-project tools leverage Blender for dataset creation and visualization).
	- Model trained to predict action sequences from start to goal; Loss includes cross-entropy over actions plus auxiliary turn/collision components.

	### Ethical considerations
	- This is a research model for toy 3D grids. It is not validated for real-world navigation where safety, environment dynamics, and constraints apply.

	### Citation
	If you use this model, please cite this repository: