Skip to content


3D Photo Inpainting - Turn Any Picture Into 3D Photo with Deep Learning and Python

Deep Learning, Computer Vision, Machine Learning, Neural Network, Python3 min read


TL;DR Learn how to create a 3D photo from a regular image using Machine Learning

Have you seen those amazing 3D photos on Facebook and Instagram? How can you create your own from regular photos? We’re going to do that with the help of a project called: 3D Photography using Context-aware Layered Depth Inpainting. We’ll try out different photos, and have a look at how it all works!

Here’s what we’ll go over:

  • Install the prerequisites for the 3D photo inpainting project
  • Look at a demo
  • Convert some images into 3D photos
  • Dive deeper into how it works
  • Look into what training data was used

Let’s make some 3D photos!


The 3D inpainting project requires some libraries preinstalled. Let’s get those:

1!pip install -q vispy==0.6.4
2!pip install -q moviepy==1.0.2
3!pip install -q transforms3d==0.3.1
4!pip install -q networkx==2.3

We’ll also define two helper functions that’ll help us visualize depth estimations and final results:

1from IPython.display import HTML
2from base64 import b64encode
4def show_inpainting(image_file, video_file):
5 image_content = open(image_file, 'rb').read()
6 video_content = open(video_file, 'rb').read()
7 image_data = "data:image/jpg;base64," + b64encode(image_content).decode()
8 video_data = "data:video/mp4;base64," + b64encode(video_content).decode()
9 html = HTML(f"""
10 <img height=756 src={image_data} />
11 <video height=756 controls loop>
12 <source src={video_data} type='video/mp4'>
13 </video>
14 """)
15 return html
17def show_depth_estimation(image_file, depth_file):
18 image_content = open(image_file, 'rb').read()
19 depth_content = open(depth_file, 'rb').read()
20 image_data = "data:image/jpg;base64," + b64encode(image_content).decode()
21 depth_data = "data:image/png;base64," + b64encode(depth_content).decode()
22 html = HTML(f"""
23 <img height=756 src={image_data} />
24 <img height=756 src={depth_data} />
25 """)
26 return html

The show_inpainting() function shows the inpainted video along with the original photo. show_depth_estimation() shows the estimated depth of each pixel of the image (more on that later).


Let’s see what we’re going to achieve:

1!mkdir demo
2!gdown -q --id 1VDT5YhANPJczevyhTdasJO5Zexl2l_fd -O demo/dog.jpg
3!gdown -q --id 1CAsRBub83ptC_zPWFRZIDQDU47tFy_ST -O demo/dog-inpainting.mp4
5show_inpainting('demo/dog.jpg', 'demo/dog-inpainting.mp4')

Original image


3D photo

On the left, we have a photo of Ahil that I’ve taken with my phone. On the right is the result of the 3D inpainting that you’re going to learn how to do.

Making 3D photos

Inpainting refers to the process of recovering parts of images and videos that were lost or purposefully removed.

The paper 3D Photography using Context-aware Layered Depth Inpainting introduces a method to convert 2D photos into 3D using inpainting techniques.

The full source code of the project is available on GitHub. Let’s clone the repo and download some pre-trained models:

1%cd /content/
2!git clone
3%cd 3d-photo-inpainting
4!git checkout e804c1cb2fd695be50946db2f1eb17134f6d1b38

Let’s clear up the demo files, provided by the project, and download our own content:

1!rm depth/*
2!rm image/*
3!rm video/*
5!gdown --id 1b4MjYo_D5sps8F6JmYnomandLyQhjo6Z -O config.yml
6!gdown --id 1TYmKRP4387hjDMFfWaeqcOVY7do-m0LE -O image/castle.jpg
7!gdown --id 1VDT5YhANPJczevyhTdasJO5Zexl2l_fd -O image/dog.jpg

The images you want to convert into 3D photos need to go into the image directory. For our example, I am adding 2 from my personal collection.

We’re going to use (mostly) the default config and make sure that offscreen rendering is disabled:

1depth_edge_model_ckpt: checkpoints/edge-model.pth
2depth_feat_model_ckpt: checkpoints/depth-model.pth
3rgb_feat_model_ckpt: checkpoints/color-model.pth
4MiDaS_model_ckpt: MiDaS/
5fps: 40
6num_frames: 240
7x_shift_range: [0.00, 0.00, -0.02, -0.02]
8y_shift_range: [0.00, 0.00, -0.02, -0.00]
9z_shift_range: [-0.05, -0.05, -0.07, -0.07]
10traj_types: ["double-straight-line", "double-straight-line", "circle", "circle"]
11video_postfix: ["dolly-zoom-in", "zoom-in", "circle", "swing"]
12specific: ""
13longer_side_len: 960
14src_folder: image
15depth_folder: depth
16mesh_folder: mesh
17video_folder: video
18load_ply: False
19save_ply: True
20inference_video: True
21gpu_ids: 0
22offscreen_rendering: False
23img_format: ".jpg"
24depth_format: ".npy"
25require_midas: True
26depth_threshold: 0.04
27ext_edge_threshold: 0.002
28sparse_iter: 5
29filter_size: [7, 7, 5, 5, 5]
30sigma_s: 4.0
31sigma_r: 0.5
32redundant_number: 12
33background_thickness: 70
34context_thickness: 140
35background_thickness_2: 70
36context_thickness_2: 70
37discount_factor: 1.00
38log_depth: True
39largest_size: 512
40depth_edge_dilate: 10
41depth_edge_dilate_2: 5
42extrapolate_border: True
43extrapolation_thickness: 60
44repeat_inpaint_edge: True
45crop_border: [0.03, 0.03, 0.05, 0.03]
46anti_flickering: True

To start the inpainting process, we need to execute the file and pass the config:

1!python --config config.yml

This might take some time, depending on the GPU that you have.

Estimated depth

I’ve promised you that we’re going to look at the estimated depth later. The time has come, let’s look at some depth estimations:

1show_depth_estimation('image/dog.jpg', 'depth/dog.png')


1show_depth_estimation('image/castle.jpg', 'depth/castle.png')


Lighter pixels represent shorter distance, relative to the camera. I would say that it’s doing a great job!


Here are the 3D inpainting of the two images:

1show_inpainting('image/dog.jpg', 'video/dog_swing.mp4')

Original image


3D photo

1show_inpainting('image/castle.jpg', 'video/castle_circle.mp4')

Original image


3D photo

Amazing, right?

How does it work?

Here is a high level overview:

  • Get the depth of each pixel (how far back is from the camera)
  • RGB-D image from a dual-camera device (phone)


  • Depth estimation with MiDaS:
  • Create LDI (layered depth image) representation
  • Detect regions with a high depth difference (context/synthesis regions)
  • Cut out those regions (this roughly resembles cutting out objects from the image)
  • Generate the background behind the cut off objects
  • Merge the background and cut out objects into a new LDI

The process is a lot more involved (including heavy image preprocessing), but you need to read the paper/code to get into the details.

What was the training data?

The authors didn’t create a special dataset for their task. They generate data.

First, the depth of images from the MSCOCO dataset is estimated using a pre-trained MegaDepth model. Then context/synthesis regions are extracted. A random sample of regions is merged with a set of images from the MSCOCO dataset. Thus, you get the ground truth of the backgrounds.


You can now convert any image into a 3D photo! Pretty amazing, right?

Here’s what you’ve went over:

  • Install the prerequisites for the 3D photo inpainting project
  • Look at a demo
  • Convert some images into 3D photos
  • Dive deeper into how it works
  • Look into what training data was used

Go on, try it on your own photos and show me the results in the comments!



Want to be a Machine Learning expert?

Join the weekly newsletter on Data Science, Deep Learning and Machine Learning in your inbox, curated by me! Chosen by 10,000+ Machine Learning practitioners. (There might be some exclusive content, too!)

You'll never get spam from me