DiffSemanticFusion: Semantic Raster BEV Fusion for Autonomous Driving via Online HD Map Diffusion
Abstract
DiffSemanticFusion enhances autonomous driving by fusing semantic raster and graph-based representations using a map diffusion module, improving trajectory prediction and end-to-end driving performance.
Autonomous driving requires accurate scene understanding, including road geometry, traffic agents, and their semantic relationships. In online HD map generation scenarios, raster-based representations are well-suited to vision models but lack geometric precision, while graph-based representations retain structural detail but become unstable without precise maps. To harness the complementary strengths of both, we propose DiffSemanticFusion -- a fusion framework for multimodal trajectory prediction and planning. Our approach reasons over a semantic raster-fused BEV space, enhanced by a map diffusion module that improves both the stability and expressiveness of online HD map representations. We validate our framework on two downstream tasks: trajectory prediction and planning-oriented end-to-end autonomous driving. Experiments on real-world autonomous driving benchmarks, nuScenes and NAVSIM, demonstrate improved performance over several state-of-the-art methods. For the prediction task on nuScenes, we integrate DiffSemanticFusion with the online HD map informed QCNet, achieving a 5.1\% performance improvement. For end-to-end autonomous driving in NAVSIM, DiffSemanticFusion achieves state-of-the-art results, with a 15\% performance gain in NavHard scenarios. In addition, extensive ablation and sensitivity studies show that our map diffusion module can be seamlessly integrated into other vector-based approaches to enhance performance. All artifacts are available at https://github.com/SunZhigang7/DiffSemanticFusion.
Community
DiffSemanticFusion [including Mapless QCNet], which achieves SOTA in both nuScenes and NAVSIM
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- TopoDiffuser: A Diffusion-Based Multimodal Trajectory Prediction Model with Topometric Maps (2025)
- MapDiffusion: Generative Diffusion for Vectorized Online HD Map Construction and Uncertainty Estimation in Autonomous Driving (2025)
- PRIX: Learning to Plan from Raw Pixels for End-to-End Autonomous Driving (2025)
- MapFM: Foundation Model-Driven HD Mapping with Multi-Task Contextual Learning (2025)
- GTAD: Global Temporal Aggregation Denoising Learning for 3D Semantic Occupancy Prediction (2025)
- What Really Matters for Robust Multi-Sensor HD Map Construction? (2025)
- LiDARCrafter: Dynamic 4D World Modeling from LiDAR Sequences (2025)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper