File size: 4,623 Bytes
4a4cfce 9417e11 4a4cfce |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 |
---
license: mit
library_name: pytorch
pipeline_tag: other
tags:
- path-planning
- 3d
- voxels
- cnn
- transformer
- robotics
- pytorch
- inference
- Blender
---
### Voxel Path Finder (3D Voxel Path Planning with CNN+Transformer)
This repository hosts the weights and code for a neural network that plans paths in a 3D voxel grid (32×32×32). The model encodes the voxelized environment (obstacles + start + goal) with a 3D CNN, fuses learned position embeddings, and autoregressively generates a sequence of movement actions with a Transformer decoder.
- **Task**: 3D voxel path planning (generate action steps from start to goal)
- **Actions**: 0..5 → [FORWARD, BACK, LEFT, RIGHT, UP, DOWN]
- **Framework**: PyTorch
- **License**: MIT
Github = https://github.com/c1tr0n75/VoxelPathFinder
### Model architecture (high level)
- Voxel encoder: 3D CNN with 3 conv blocks → 512-d environment feature
- Position encoder: learned embeddings over (x, y, z) → 64-d position feature
- Planner: Transformer decoder over action tokens with START/END special tokens
- Output: action token sequence; special tokens are excluded from final path
### Inputs and outputs
- **Input tensors**
- `voxel_data`: float tensor of shape `[1, 3, 32, 32, 32]`
Channels: [obstacles, start_mask, goal_mask]
- `positions`: long tensor of shape `[1, 2, 3]`
Format: `[[start_xyz, goal_xyz]]` with each coordinate in `[0, 31]`
- **Output**
- Long tensor `[1, T]` of action IDs (0..5), padded internally with END if needed
### Quickstart (inference)
Make sure this repo includes both `final_model.pth` (or `model_state_dict`) and `pathfinding_nn.py`.
```python
import torch, numpy as np
from huggingface_hub import hf_hub_download
import importlib.util, sys
REPO_ID = "c1tr0n75/VoxelPathFinder"
# Download files from the Hub
pth_path = hf_hub_download(repo_id=REPO_ID, filename="final_model.pth")
py_path = hf_hub_download(repo_id=REPO_ID, filename="pathfinding_nn.py")
# Dynamically import the model code
spec = importlib.util.spec_from_file_location("pathfinding_nn", py_path)
mod = importlib.util.module_from_spec(spec)
spec.loader.exec_module(mod)
PathfindingNetwork = mod.PathfindingNetwork
create_voxel_input = mod.create_voxel_input
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model = PathfindingNetwork().to(device).eval()
# Load weights (supports either a plain state_dict or {'model_state_dict': ...})
ckpt = torch.load(pth_path, map_location=device)
state = ckpt["model_state_dict"] if isinstance(ckpt, dict) and "model_state_dict" in ckpt else ckpt
model.load_state_dict(state)
# Build a random test environment
voxel_dim = model.voxel_dim # (32, 32, 32)
D, H, W = voxel_dim
obstacle_prob = 0.2
obstacles = (np.random.rand(D, H, W) < obstacle_prob).astype(np.float32)
free = np.argwhere(obstacles == 0)
assert len(free) >= 2, "Not enough free cells; lower obstacle_prob"
s_idx, g_idx = np.random.choice(len(free), size=2, replace=False)
start = tuple(free[s_idx])
goal = tuple(free[g_idx])
voxel_np = create_voxel_input(obstacles, start, goal, voxel_dim=voxel_dim) # (3,32,32,32)
voxel = torch.from_numpy(voxel_np).float().unsqueeze(0).to(device) # (1,3,32,32,32)
pos = torch.tensor([[start, goal]], dtype=torch.long, device=device) # (1,2,3)
with torch.no_grad():
actions = model(voxel, pos)[0].tolist()
ACTION_NAMES = ['FORWARD', 'BACK', 'LEFT', 'RIGHT', 'UP', 'DOWN']
decoded = [ACTION_NAMES[a] for a in actions if 0 <= a < 6]
print(f"Start: {start} | Goal: {goal}")
print(f"Generated {len(decoded)} steps (first 30): {decoded[:30]}")
```
### Intended uses and limitations
- **Intended**: Research and demo of 3D voxel path planning; educational examples; quick inference in CPU/GPU environments.
- **Not intended**: Safety-critical navigation without additional validation; large scenes beyond 32³ without retraining; Blender-based generation on hosted environments.
- The generated actions may not yield collision-free paths in complex scenes; downstream validation is recommended.
### Training data and procedure
- Synthetic voxel environments were generated (in-project tools leverage Blender for dataset creation and visualization).
- Model trained to predict action sequences from start to goal; Loss includes cross-entropy over actions plus auxiliary turn/collision components.
### Ethical considerations
- This is a research model for toy 3D grids. It is not validated for real-world navigation where safety, environment dynamics, and constraints apply.
### Citation
If you use this model, please cite this repository:
|