--- license: mit library_name: pytorch pipeline_tag: other tags: - path-planning - 3d - voxels - cnn - transformer - robotics - pytorch - inference - Blender --- ### Voxel Path Finder (3D Voxel Path Planning with CNN+Transformer) This repository hosts the weights and code for a neural network that plans paths in a 3D voxel grid (32×32×32). The model encodes the voxelized environment (obstacles + start + goal) with a 3D CNN, fuses learned position embeddings, and autoregressively generates a sequence of movement actions with a Transformer decoder. - **Task**: 3D voxel path planning (generate action steps from start to goal) - **Actions**: 0..5 → [FORWARD, BACK, LEFT, RIGHT, UP, DOWN] - **Framework**: PyTorch - **License**: MIT Github = https://github.com/c1tr0n75/VoxelPathFinder ### Model architecture (high level) - Voxel encoder: 3D CNN with 3 conv blocks → 512-d environment feature - Position encoder: learned embeddings over (x, y, z) → 64-d position feature - Planner: Transformer decoder over action tokens with START/END special tokens - Output: action token sequence; special tokens are excluded from final path ### Inputs and outputs - **Input tensors** - `voxel_data`: float tensor of shape `[1, 3, 32, 32, 32]` Channels: [obstacles, start_mask, goal_mask] - `positions`: long tensor of shape `[1, 2, 3]` Format: `[[start_xyz, goal_xyz]]` with each coordinate in `[0, 31]` - **Output** - Long tensor `[1, T]` of action IDs (0..5), padded internally with END if needed ### Quickstart (inference) Make sure this repo includes both `final_model.pth` (or `model_state_dict`) and `pathfinding_nn.py`. ```python import torch, numpy as np from huggingface_hub import hf_hub_download import importlib.util, sys REPO_ID = "c1tr0n75/VoxelPathFinder" # Download files from the Hub pth_path = hf_hub_download(repo_id=REPO_ID, filename="final_model.pth") py_path = hf_hub_download(repo_id=REPO_ID, filename="pathfinding_nn.py") # Dynamically import the model code spec = importlib.util.spec_from_file_location("pathfinding_nn", py_path) mod = importlib.util.module_from_spec(spec) spec.loader.exec_module(mod) PathfindingNetwork = mod.PathfindingNetwork create_voxel_input = mod.create_voxel_input device = torch.device("cuda" if torch.cuda.is_available() else "cpu") model = PathfindingNetwork().to(device).eval() # Load weights (supports either a plain state_dict or {'model_state_dict': ...}) ckpt = torch.load(pth_path, map_location=device) state = ckpt["model_state_dict"] if isinstance(ckpt, dict) and "model_state_dict" in ckpt else ckpt model.load_state_dict(state) # Build a random test environment voxel_dim = model.voxel_dim # (32, 32, 32) D, H, W = voxel_dim obstacle_prob = 0.2 obstacles = (np.random.rand(D, H, W) < obstacle_prob).astype(np.float32) free = np.argwhere(obstacles == 0) assert len(free) >= 2, "Not enough free cells; lower obstacle_prob" s_idx, g_idx = np.random.choice(len(free), size=2, replace=False) start = tuple(free[s_idx]) goal = tuple(free[g_idx]) voxel_np = create_voxel_input(obstacles, start, goal, voxel_dim=voxel_dim) # (3,32,32,32) voxel = torch.from_numpy(voxel_np).float().unsqueeze(0).to(device) # (1,3,32,32,32) pos = torch.tensor([[start, goal]], dtype=torch.long, device=device) # (1,2,3) with torch.no_grad(): actions = model(voxel, pos)[0].tolist() ACTION_NAMES = ['FORWARD', 'BACK', 'LEFT', 'RIGHT', 'UP', 'DOWN'] decoded = [ACTION_NAMES[a] for a in actions if 0 <= a < 6] print(f"Start: {start} | Goal: {goal}") print(f"Generated {len(decoded)} steps (first 30): {decoded[:30]}") ``` ### Intended uses and limitations - **Intended**: Research and demo of 3D voxel path planning; educational examples; quick inference in CPU/GPU environments. - **Not intended**: Safety-critical navigation without additional validation; large scenes beyond 32³ without retraining; Blender-based generation on hosted environments. - The generated actions may not yield collision-free paths in complex scenes; downstream validation is recommended. ### Training data and procedure - Synthetic voxel environments were generated (in-project tools leverage Blender for dataset creation and visualization). - Model trained to predict action sequences from start to goal; Loss includes cross-entropy over actions plus auxiliary turn/collision components. ### Ethical considerations - This is a research model for toy 3D grids. It is not validated for real-world navigation where safety, environment dynamics, and constraints apply. ### Citation If you use this model, please cite this repository: