Pandora3D: A Comprehensive Framework for High-Quality 3D Shape and Texture Generation
Abstract
This report presents a comprehensive framework for generating high-quality 3D shapes and textures from diverse input prompts, including single images, multi-view images, and text descriptions. The framework consists of 3D shape generation and texture generation. (1). The 3D shape generation pipeline employs a Variational Autoencoder (VAE) to encode implicit 3D geometries into a latent space and a diffusion network to generate latents conditioned on input prompts, with modifications to enhance model capacity. An alternative Artist-Created Mesh (AM) generation approach is also explored, yielding promising results for simpler geometries. (2). Texture generation involves a multi-stage process starting with frontal images generation followed by multi-view images generation, RGB-to-PBR texture conversion, and high-resolution multi-view texture refinement. A consistency scheduler is plugged into every stage, to enforce pixel-wise consistency among multi-view textures during inference, ensuring seamless integration. The pipeline demonstrates effective handling of diverse input formats, leveraging advanced neural architectures and novel methodologies to produce high-quality 3D content. This report details the system architecture, experimental results, and potential future directions to improve and expand the framework. The source code and pretrained weights are released at: https://github.com/Tencent/Tencent-XR-3DGen.
Community
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- Im2SurfTex: Surface Texture Generation via Neural Backprojection of Multi-View Images (2025)
- CaPa: Carve-n-Paint Synthesis for Efficient 4K Textured Mesh Generation (2025)
- InsTex: Indoor Scenes Stylized Texture Synthesis (2025)
- CubeDiff: Repurposing Diffusion-Based Image Models for Panorama Generation (2025)
- ProcTex: Consistent and Interactive Text-to-texture Synthesis for Procedural Models (2025)
- Hunyuan3D 2.0: Scaling Diffusion Models for High Resolution Textured 3D Assets Generation (2025)
- MEt3R: Measuring Multi-View Consistency in Generated Images (2025)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper
Collections including this paper 0
No Collection including this paper