Skip to content

Implementation of the "Generative AI for 2.5D content creation with depth-guided object placement" pipeline as the part of the bachelor thesis conducted by Viktoriia Maksymiuk under the supervision of Dr. Mikoล‚aj Jankowski.

License

Notifications You must be signed in to change notification settings

Vihtoriaaa/GenAI-2.5D-Content-Creation

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

31 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

๐Ÿ‘ฉโ€๐ŸŽ“ Generative AI for 2.5D content creation with depth-guided object placement

Implementation of the "Generative AI for 2.5D content creation with depth-guided object placement" pipeline as the part of the bachelor thesis conducted by Viktoriia Maksymiuk under the supervision of Dr. Mikoล‚aj Jankowski. It was submitted in fulfilment of the requirements for the Bachelor of Science degree in the Department of Computer Science and Information Technologies at the Faculty of Applied Sciences.

๐Ÿฆฟ Launch instructions

Follow the next steps to set up the pipeline for creating 2.5D content with depth-guided object placement.

๐ŸงŒ Step 1. Setting Up Blender as a Command Line Tool

This guide will walk you through the process of setting up Blender as a command-line tool on various operating systems. Once set up, you'll be able to run Blender from the command line by simply typing blender.

Prerequisites

Before you begin, ensure that you have Blender installed on your system. You can download the latest version of Blender from the official website: Blender Download

Adding Blender to the System Path Permanently

Follow the steps outlined in the tutorial related to your OS to add Blender to the system path permamently.

To ensure that Blender has been set up correctly, open a new command prompt/terminal window and type blender. Blender should launch without any errors.

๐Ÿ“ฆ Step 2. Access Repository Code

Clone the repository using the following command:

git clone https://github.com/Vihtoriaaa/GenAI-2.5D-Content-Creation

Move to the proper project folder:

cd GenAI-2.5D-Content-Creation/

๐Ÿ Step 3. Set Up Conda environment

To ensure smoother integration and management of dependencies, we recommend using the Anaconda package manager to avoid dependency/reproducibility problems.

Install Conda

If you haven't already, install Conda for your OS by following the instructions provided in the official Conda Documentation.

Create a Conda Environment with dependencies

Note

Before setting up the Conda environment, please note that depending on your chosen configuration (CPU or GPU), the pipeline run will be different, as the CPU pipeline will utilize external sources for specific processes, while for the GPU, every step will run directly on your local machine. Certain processes or computations may be accelerated when using GPU, while they might run slower on the CPU. Make sure to select the appropriate setup that aligns with your system resources and requirements.

  • If you have GPU (CUDA) available, create a new Conda environment using the following commands one by one:
conda env create -f gpu_environment.yaml
conda activate genai-env
  • If you have CPU only, use the following command:
conda env create -f cpu_environment.yaml
conda activate genai-env

These commands will create a new Conda environment named genai-env, install all the necessary packages depending on the selected configuration specified in the corresponding environment file, and activate it.

๐Ÿƒโ€โ™€๏ธ Step 4. Pipeline Run

CPU configuration set up and run The CPU-based version is for users with limited computational resources, therefore, certain pipeline steps, such as scene image and depth map generation, will rely on external services. This approach ensures all users can test and use the project regardless of their systemโ€™s capabilities. To enable pipeline run, follow the next steps.

๐Ÿงโ€โ™€๏ธ Scene image generation with Stable Diffusion XL (SDXL)

For our pipeline, we decided to utilize the Juggernaut v7 model, a variant of the Stable Diffusion XL (SDXL) model. The SDXL model is an improved version of the original SD, providing more realistic and detailed generated images. The Juggernaut v7 model is a widely recognized and selected model by the GenAI community on CivitAI, a platform for accessing and collaborating on generative AI models and research. To generate a scene image, you can use spaces on Hugging Face for SDXL. At least two Hugging Face spaces are available for scene image generation with the Juggernaut v7 model. These are: Option A and Option B. Generate the needed scene image by providing the text prompt describing it, and then download the generated image and put it into the project folder somewhere.

๐Ÿฆ† Depth map estimation with Marigold

For our pipeline, we decided to utilize the Marigold model for depth map estimation because of its significant advancement for the Monocular Depth Estimation (MDE) task within the computer vision area. Moreover, it is fast and easy to use to capture the necessary depth information for realistic object placement. To generate a depth map for the scene image, you can use the following Hugging Face space. Please provide the previously SD-generated and saved scene image as input and wait for the output results. Download an image with "_depth_16bit.png" on its name. This is a file we need for our pipeline.

๐ŸŽ€ Pipeline Run

Now, to run the 2.5D content creation with depth-guided object placement pipeline, follow the next steps using terminal:

  1. Have generated scene image, its depth map, and selected 3D object to appropriate folders.

  2. Run cd pipeline/ to move to folder with pipeline code.

  3. Run python cpu_pipeline.py to launch the pipeline.

  4. You will be asked to provide the 3D object you want to place within the generated scene; please choose an appropriate one. The object has to be of ".fbx" extension. If you don't have one, you can download one from websites that offer existing 3D models, for instance, TurboSquid.

  5. When the object is selected, you will be asked to choose where to place the previously provided object. A scene image is displayed. You can then simply click on any location within the generated scene image where you wish to place your 3D object. When the desired location is selected, press 'Enter' to continue or 'R' to reselect the location.

  6. You're done ๐ŸŽ‰ Wait till the pipeline finishes its execution. Generated 2.5D content results are saved under the rendered_results folder, named as the pipeline execution date; check them out!๐Ÿงโ€โ™€๏ธ

GPU configuration set up and run

The GPU-accelerated version is designed for users with local GPU resources who can run the entire pipeline workflow processes locally. For optimal performance when running the GPU version, it is recommended to use an Nvidia GPU with CUDA support and at least 6โ€“8 GB of VRAM, as this configuration ensures efficient processing and sufficient memory for running the pipeline locally.

๐Ÿ’ Installation of automatic1111

GPU pipeline executes the entire workflow locally, starting from scene image generation with Stable Diffusion (SD) and ending with content rendering in Blender. To set everything up for such a run, you need to use automatic1111, a web-based interface for the SD model, to simplify and speed up scene creation using its API. Please follow installation instructions from the official repository automatic1111.

๐Ÿฆœ Downloading Stable Diffusion Models

For our pipeline, we decided to utilize the Juggernaut v7 model, a variant of the Stable Diffusion XL (SDXL) model. The SDXL model is an improved version of the original SD, providing more realistic and detailed generated images. The Juggernaut v7 model is a widely recognized and selected model by the GenAI community on CivitAI, a platform for accessing and collaborating on generative AI models and research. The model can be downloaded from CivitAI2 link, please click on the 1 File drop-down list on the right and download the model with ".safetensors" extension.

When the model is downloaded, go to the stable-diffusion-webui folder, and then navigate to the models/Stable-diffusion folder, where you should see a file named "Put Stable Diffusion checkpoints here.txt." Put the previously downloaded Juggernaut v7 model checkpoint file in this folder. You can also download other models, for instance, the Stable Diffusion v1.5 model checkpoint file download link, also supported for our pipeline.

The last step is to enable usage of automatic1111 through API. To achieve this, go to stable-diffusion-webui folder, right-click on the file webui-user.bat and select Edit. Replace the line

set COMMANDLINE_ARGS=

with

set COMMANDLINE_ARGS=--api

Each individual argument need to be separated by a space.

Additionally, if you have less than 8 GB VRAM on GPU, it is a good idea to add the --medvram argument to save memory to generate more images at a time. Add this argument after an api one. Finally, save the changes and double-click the webui-user.bat file to run Stable Diffusion.

๐ŸŽ€ Pipeline Run (finally:D)

Now, to run the 2.5D content creation with depth-guided object placement pipeline, follow the next steps using terminal:

  1. Move to the place where the GenAI-2.5D-Content-Creation project was cloned.

  2. Run cd pipeline/ to move to folder with pipeline code.

  3. To launch the pipeline, run python gpu_pipeline.py --prompt "{your scene description}". Provide the scene description you want to generate for your content.

  4. Wait for the pipeline to generate the scene image. It is necessary to note that you can regenerate images if needed; you will be asked during the generation process whether to proceed with the generated image.

  5. After the scene image is generated, you will be asked to provide the 3D object you want to place within the generated scene; please choose an appropriate one. The object has to be of ".fbx" extension. If you don't have one, you can download one from websites that offer existing 3D models, for instance, TurboSquid.

  6. When the object is selected, you will be asked to choose where to place the previously provided object. A scene image is displayed. You can then simply click on any location within the generated scene image where you wish to place your 3D object. When the desired location is selected, press 'Enter' to continue or 'R' to reselect the location.

  7. You're done ๐ŸŽ‰ Wait till the pipeline finishes its execution. Generated 2.5D content results are saved under the rendered_results folder, named as the pipeline execution date; check them out!๐Ÿงโ€โ™€๏ธ

Other command line arguments that can be provided to configure the pipeline run are listed in the table below:

Name Description Type Default Value
negative_prompt Negative text promp. str "" (empty string)
width Generated image width in pixels int 1024
height Generated image height in pixels int 1024
steps Number of steps to run the generation process int 30
sampler_name Name of the sampler to use str "DPM++ 2M Karras"
cfg_scale CFG scale number int 7
seed Seed for reproducibility (-1 for random) int -1
checkpoint Stable Diffusion checkpoint str "juggernautXL_v7Rundiffusion.safetensors [0724518c6b]"
marigold_checkpoint Marigold checkpoint path or hub name str "prs-eth/marigold-lcm-v1-0"

To use any of the arguments shown in the table, include them in the command along with --prompt. Here's the usage example with all available options:

python gpu_pipeline.py [-h] --prompt PROMPT [--negative_prompt NEGATIVE_PROMPT] [--width WIDTH] [--height HEIGHT] [--steps STEPS]
                [--sampler_name {DPM++ 2M Karras,Euler a,DPM++ SDE Karras}] [--cfg_scale CFG_SCALE] [--seed SEED]
                [--checkpoint {juggernautXL_v7Rundiffusion.safetensors [0724518c6b],v1-5-pruned-emaonly.safetensors [6ce0161689]}]
                [--marigold_checkpoint {prs-eth/marigold-lcm-v1-0,prs-eth/marigold-v1-0,Bingxin/Marigold}]

Additional options for certain arguments:

  • sampler_name:

    • Choices: "DPM++ 2M Karras", "Euler a", "DPM++ SDE Karras"
  • checkpoint:

    • Choices:
      • "juggernautXL_v7Rundiffusion.safetensors [0724518c6b]"
      • "v1-5-pruned-emaonly.safetensors [6ce0161689]"
  • marigold_checkpoint:

    • Choices:
      • "prs-eth/marigold-lcm-v1-0" (LCM version - faster speed)
      • "prs-eth/marigold-v1-0"
      • "Bingxin/Marigold"

๐Ÿ—ƒ๏ธ Repository Organization

This repository is organized into several directories, each serving a specific function. Below is a description of each directory:

  • 3d_objects: a 3D object folder with an example of an object used for pipeline 2.5D content creation.
  • depth_maps: an image folder with examples of depth maps generated with the Marigold model.
  • colored_depth_maps: an image folder with examples of colored depth maps generated with the Marigold model.
  • rendered_results: an image folder with examples of the pipeline's final results of 2.5D content.
  • to_depth: an image folder with examples of scene images generated with the Stable Diffusion model.
  • pipeline: a folder containing pipeline implementation code. Here is a general overview of the files contained in this folder:
Name Description
background_enhancement.py Contains the code for High Dynamic Range Imaging (HDRI) image generation, used to provide a realistic and natural lighting source for the 2.5D scene.
blender.py Contains the code for content creation using Blender API.
cpu_pipeline.py Contains CPU-based pipeline version code used by users with limited computational resources.
gpu_pipeline.py Contains GPU-accelerated pipeline version code used by users with local GPU resources.
depthToNormal.py Contains the code for surface normal map estimation from depth map.
depth_estimation_marigold.py Contains the code for local depth map estimation with the Marigold model. Used only for the GPU pipeline version.
extract_clicked_points.py Contains the code to extract the points clicked on the image. Saves the points' coordinates to the "clicked_points.txt" file, which can be used with the DepthToNormalMap file to visualize extracted surface normals for clicked points.
payload_base.json Contains default configuration json data used for API calls to the automatic1111 API to generate scene images with Stable Diffusion. Used only for the GPU pipeline version.
diode_metrics.ipynb Contains the code used to process the DIODE Indoor validation dataset and extract surface normal estimation metrics.
results/ Folder containing intermediate images generated during the pipeline run. Files such as: for CPU version - HDRI images, for GPU version - generated scene images with Stable Diffusion, their depth maps (with colored version), HDRI images.

๐Ÿ‘ฉโ€๐ŸŒพ Contributors

๐ŸŽซ License

Distributed under the MIT license.

About

Implementation of the "Generative AI for 2.5D content creation with depth-guided object placement" pipeline as the part of the bachelor thesis conducted by Viktoriia Maksymiuk under the supervision of Dr. Mikoล‚aj Jankowski.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published