We read every piece of feedback, and take your input very seriously. To see all available qualifiers, see our documentation. [ICLR24] Official PyTorch Implementation of Magic123: One Image to High-Quality 3D Object Generation Using Both 2D and 3D Diffusion Priors arXiv | webpage Guocheng Qian 1,2, Jinjie Mai 1, Abdullah Hamdi 3, Jian Ren 2, Aliaksandr Siarohin 2, Bing Li 1, Hsin-Ying Lee 2, Ivan Skorokhodov 1,2, Peter Wonka 1, Sergey Tulyakov 2, Bernard Ghanem 1 1 King Abdullah University of Science and Technology (KAUST), 2 Snap Inc., 3 Visual Geometry Group, University of Oxford Training convergence of a demo example: Compare Magic123 without textual inversion with abaltions using only 2D prior (SDS) or using only 3D prior (Zero123): Effects of Joint Prior. Increasing the strength of 2D prior leads to more imagination, more details, and less 3D consistencies. Official PyTorch Implementation of Magic123: One Image to High-Quality 3D Object Generation Using Both 2D and 3D Diffusion Priors. Code is built upon Stable-DreamFusion repo. We only test on Ubuntu system. Make sure git, wget, Eigen are installed. Note: in this install.sh, we use python venv by default. If you prefer conda, uncomment the conda and comment venv in the file and run the same command. Zero-1-to-3 for 3D diffusion prior. We use 105000.ckpt by default, reimplementation borrowed from Stable Diffusion repo, and is available in guidance/zero123_utils.py. MiDaS for depth estimation. We use dpt_beit_large_512.pt. Put it in folder pretrained/midas/ We have included all preprocessed files in ./data directory. Preprocessing is only necessary if you want to test on your own examples. Takes seconds. Magic123 uses the default textual inversion from diffuers, which consumes around 2 hours on a 32G V100. If you do not want to spend time in this textual inversion, you can: (1) study whether there is other faster textual inversion; or (2) do not use textual inversion in the loss of texture and shape consistencies. To run textual inversion: $token_name is a the special token, usually name that by examplename $init_token is a single token to describe the image using natural language For example: Don't forget to move the final learned_embeds.bin under data/demo/a-full-body-ironman/ Takes ~40 mins for the coarse stage and ~20 mins for the second stage on a 32G V100. As an example, run Magic123 in the dragon example using both stages in GPU 0 and set the jobname for the first stage as nerf and the jobname for the second stage as dmtet, by the following command: More arguments (e.g. --lambda_guidance 1 40) can be appended to the command line such as: textual inversion is tedious (requires ~2.5 hours optimization), if you want to test Magic123 quickly on your own example without textual inversion (might degrade the performance), try the following: first, foreground and depth estimation Run Magic123 coarse stage without textual inversion, takes ~40 mins Run Magic123 fine stage without textual inversion, takes around ~20 mins Run Magic123 with only 2D prior with textual inversion (Like RealFusion but we achieve much better performance through training stragies and the coarse-to-fine pipeline) Run Magic123 with only 2D prior without textual inversion (Like RealFusion but we achieve much better performance through training stragies and the coarse-to-fine pipeline) note: change the path and the text prompt inside the script if you wana test another example. Run Magic123 with only 3D prior (Like Zero-1-to-3 but we achieve much better performance through training stragies and the coarse-to-fine pipeline) This work is build upon Stable DreamFusion, many thanks to the author Kiui Jiaxiang Tang and many other contributors. We also get inspirations from a list of amazing research works and open-source projects, thanks a lot to all the authors for sharing! DreamFusion: Text-to-3D using 2D Diffusion Magic3D: High-Resolution Text-to-3D Content Creation Zero-1-to-3: Zero-shot One Image to 3D Object RealFusion: 360° Reconstruction of Any Object from a Single Image Make-it-3d: High-fidelity 3d creation from a single image with diffusion prior Stable Diffusion and the diffusers library. If you find this work useful, a citation will be appreciated via: [ICLR24] Official PyTorch Implementation of Magic123: One Image to High-Quality 3D Object Generation Using Both 2D and 3D Diffusion Priors