File size: 4,623 Bytes

---
license: mit
library_name: pytorch
pipeline_tag: other
tags:
  - path-planning
  - 3d
  - voxels
  - cnn
  - transformer
  - robotics
  - pytorch
  - inference
  - Blender
---

### Voxel Path Finder (3D Voxel Path Planning with CNN+Transformer)

This repository hosts the weights and code for a neural network that plans paths in a 3D voxel grid (32×32×32). The model encodes the voxelized environment (obstacles + start + goal) with a 3D CNN, fuses learned position embeddings, and autoregressively generates a sequence of movement actions with a Transformer decoder.

- **Task**: 3D voxel path planning (generate action steps from start to goal)
- **Actions**: 0..5 → [FORWARD, BACK, LEFT, RIGHT, UP, DOWN]
- **Framework**: PyTorch
- **License**: MIT
Github = https://github.com/c1tr0n75/VoxelPathFinder

### Model architecture (high level)
- Voxel encoder: 3D CNN with 3 conv blocks → 512-d environment feature
- Position encoder: learned embeddings over (x, y, z) → 64-d position feature
- Planner: Transformer decoder over action tokens with START/END special tokens
- Output: action token sequence; special tokens are excluded from final path

### Inputs and outputs
- **Input tensors**
  - `voxel_data`: float tensor of shape `[1, 3, 32, 32, 32]`  
    Channels: [obstacles, start_mask, goal_mask]
  - `positions`: long tensor of shape `[1, 2, 3]`  
    Format: `[[start_xyz, goal_xyz]]` with each coordinate in `[0, 31]`
- **Output**
  - Long tensor `[1, T]` of action IDs (0..5), padded internally with END if needed

### Quickstart (inference)
Make sure this repo includes both `final_model.pth` (or `model_state_dict`) and `pathfinding_nn.py`.

```python
import torch, numpy as np
from huggingface_hub import hf_hub_download
import importlib.util, sys

REPO_ID = "c1tr0n75/VoxelPathFinder"
# Download files from the Hub
pth_path = hf_hub_download(repo_id=REPO_ID, filename="final_model.pth")
py_path  = hf_hub_download(repo_id=REPO_ID, filename="pathfinding_nn.py")

# Dynamically import the model code
spec = importlib.util.spec_from_file_location("pathfinding_nn", py_path)
mod = importlib.util.module_from_spec(spec)
spec.loader.exec_module(mod)
PathfindingNetwork = mod.PathfindingNetwork
create_voxel_input = mod.create_voxel_input

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model = PathfindingNetwork().to(device).eval()

# Load weights (supports either a plain state_dict or {'model_state_dict': ...})
ckpt = torch.load(pth_path, map_location=device)
state = ckpt["model_state_dict"] if isinstance(ckpt, dict) and "model_state_dict" in ckpt else ckpt
model.load_state_dict(state)

# Build a random test environment
voxel_dim = model.voxel_dim  # (32, 32, 32)
D, H, W = voxel_dim
obstacle_prob = 0.2
obstacles = (np.random.rand(D, H, W) < obstacle_prob).astype(np.float32)
free = np.argwhere(obstacles == 0)
assert len(free) >= 2, "Not enough free cells; lower obstacle_prob"
s_idx, g_idx = np.random.choice(len(free), size=2, replace=False)
start = tuple(free[s_idx])
goal = tuple(free[g_idx])

voxel_np = create_voxel_input(obstacles, start, goal, voxel_dim=voxel_dim)  # (3,32,32,32)
voxel = torch.from_numpy(voxel_np).float().unsqueeze(0).to(device)           # (1,3,32,32,32)
pos = torch.tensor([[start, goal]], dtype=torch.long, device=device)         # (1,2,3)

with torch.no_grad():
    actions = model(voxel, pos)[0].tolist()

ACTION_NAMES = ['FORWARD', 'BACK', 'LEFT', 'RIGHT', 'UP', 'DOWN']
decoded = [ACTION_NAMES[a] for a in actions if 0 <= a < 6]
print(f"Start: {start} | Goal: {goal}")
print(f"Generated {len(decoded)} steps (first 30): {decoded[:30]}")
```

### Intended uses and limitations
- **Intended**: Research and demo of 3D voxel path planning; educational examples; quick inference in CPU/GPU environments.
- **Not intended**: Safety-critical navigation without additional validation; large scenes beyond 32³ without retraining; Blender-based generation on hosted environments.
- The generated actions may not yield collision-free paths in complex scenes; downstream validation is recommended.

### Training data and procedure
- Synthetic voxel environments were generated (in-project tools leverage Blender for dataset creation and visualization).
- Model trained to predict action sequences from start to goal; Loss includes cross-entropy over actions plus auxiliary turn/collision components.

### Ethical considerations
- This is a research model for toy 3D grids. It is not validated for real-world navigation where safety, environment dynamics, and constraints apply.

### Citation
If you use this model, please cite this repository: