|
--- |
|
license: mit |
|
library_name: pytorch |
|
pipeline_tag: other |
|
tags: |
|
- path-planning |
|
- 3d |
|
- voxels |
|
- cnn |
|
- transformer |
|
- robotics |
|
- pytorch |
|
- inference |
|
- Blender |
|
--- |
|
|
|
### Voxel Path Finder (3D Voxel Path Planning with CNN+Transformer) |
|
|
|
This repository hosts the weights and code for a neural network that plans paths in a 3D voxel grid (32×32×32). The model encodes the voxelized environment (obstacles + start + goal) with a 3D CNN, fuses learned position embeddings, and autoregressively generates a sequence of movement actions with a Transformer decoder. |
|
|
|
- **Task**: 3D voxel path planning (generate action steps from start to goal) |
|
- **Actions**: 0..5 → [FORWARD, BACK, LEFT, RIGHT, UP, DOWN] |
|
- **Framework**: PyTorch |
|
- **License**: MIT |
|
Github = https://github.com/c1tr0n75/VoxelPathFinder |
|
|
|
### Model architecture (high level) |
|
- Voxel encoder: 3D CNN with 3 conv blocks → 512-d environment feature |
|
- Position encoder: learned embeddings over (x, y, z) → 64-d position feature |
|
- Planner: Transformer decoder over action tokens with START/END special tokens |
|
- Output: action token sequence; special tokens are excluded from final path |
|
|
|
### Inputs and outputs |
|
- **Input tensors** |
|
- `voxel_data`: float tensor of shape `[1, 3, 32, 32, 32]` |
|
Channels: [obstacles, start_mask, goal_mask] |
|
- `positions`: long tensor of shape `[1, 2, 3]` |
|
Format: `[[start_xyz, goal_xyz]]` with each coordinate in `[0, 31]` |
|
- **Output** |
|
- Long tensor `[1, T]` of action IDs (0..5), padded internally with END if needed |
|
|
|
### Quickstart (inference) |
|
Make sure this repo includes both `final_model.pth` (or `model_state_dict`) and `pathfinding_nn.py`. |
|
|
|
```python |
|
import torch, numpy as np |
|
from huggingface_hub import hf_hub_download |
|
import importlib.util, sys |
|
|
|
REPO_ID = "c1tr0n75/VoxelPathFinder" |
|
# Download files from the Hub |
|
pth_path = hf_hub_download(repo_id=REPO_ID, filename="final_model.pth") |
|
py_path = hf_hub_download(repo_id=REPO_ID, filename="pathfinding_nn.py") |
|
|
|
# Dynamically import the model code |
|
spec = importlib.util.spec_from_file_location("pathfinding_nn", py_path) |
|
mod = importlib.util.module_from_spec(spec) |
|
spec.loader.exec_module(mod) |
|
PathfindingNetwork = mod.PathfindingNetwork |
|
create_voxel_input = mod.create_voxel_input |
|
|
|
device = torch.device("cuda" if torch.cuda.is_available() else "cpu") |
|
model = PathfindingNetwork().to(device).eval() |
|
|
|
# Load weights (supports either a plain state_dict or {'model_state_dict': ...}) |
|
ckpt = torch.load(pth_path, map_location=device) |
|
state = ckpt["model_state_dict"] if isinstance(ckpt, dict) and "model_state_dict" in ckpt else ckpt |
|
model.load_state_dict(state) |
|
|
|
# Build a random test environment |
|
voxel_dim = model.voxel_dim # (32, 32, 32) |
|
D, H, W = voxel_dim |
|
obstacle_prob = 0.2 |
|
obstacles = (np.random.rand(D, H, W) < obstacle_prob).astype(np.float32) |
|
free = np.argwhere(obstacles == 0) |
|
assert len(free) >= 2, "Not enough free cells; lower obstacle_prob" |
|
s_idx, g_idx = np.random.choice(len(free), size=2, replace=False) |
|
start = tuple(free[s_idx]) |
|
goal = tuple(free[g_idx]) |
|
|
|
voxel_np = create_voxel_input(obstacles, start, goal, voxel_dim=voxel_dim) # (3,32,32,32) |
|
voxel = torch.from_numpy(voxel_np).float().unsqueeze(0).to(device) # (1,3,32,32,32) |
|
pos = torch.tensor([[start, goal]], dtype=torch.long, device=device) # (1,2,3) |
|
|
|
with torch.no_grad(): |
|
actions = model(voxel, pos)[0].tolist() |
|
|
|
ACTION_NAMES = ['FORWARD', 'BACK', 'LEFT', 'RIGHT', 'UP', 'DOWN'] |
|
decoded = [ACTION_NAMES[a] for a in actions if 0 <= a < 6] |
|
print(f"Start: {start} | Goal: {goal}") |
|
print(f"Generated {len(decoded)} steps (first 30): {decoded[:30]}") |
|
``` |
|
|
|
### Intended uses and limitations |
|
- **Intended**: Research and demo of 3D voxel path planning; educational examples; quick inference in CPU/GPU environments. |
|
- **Not intended**: Safety-critical navigation without additional validation; large scenes beyond 32³ without retraining; Blender-based generation on hosted environments. |
|
- The generated actions may not yield collision-free paths in complex scenes; downstream validation is recommended. |
|
|
|
### Training data and procedure |
|
- Synthetic voxel environments were generated (in-project tools leverage Blender for dataset creation and visualization). |
|
- Model trained to predict action sequences from start to goal; Loss includes cross-entropy over actions plus auxiliary turn/collision components. |
|
|
|
### Ethical considerations |
|
- This is a research model for toy 3D grids. It is not validated for real-world navigation where safety, environment dynamics, and constraints apply. |
|
|
|
### Citation |
|
If you use this model, please cite this repository: |
|
|