Final Project: Neural Radiance Fields (NeRF)

In this project, we implement neural radiance fields first in 2d, which learns to map pixel coordinates to RGB values. Then, we implement NeRF in 3d, where the model learns a 3D model of the scene: the model is trained to predicte RGB and density values for any 3D point, and we can use volumetric rendering to use these predictions to generate an image from any camera viewpoints. These rendered images are compared with the training camera images and used to train the model, as well as to generate novel views of the scene.

Part 1: Fit a Neural Field to a 2D Image

In this part, I create a neural field F: {u, v} -> {r, g, b} and optimize it to fit to a particular image. Here is an overview of my implementation.

I tried out 2 images. For each, Intermediate results of the model learning + PSNR plots are shown below, using 3 different hyperparameter settings.

For hyperparameter tuning, I tried varying L (controls the length of the sinusoidal PE) and num_MLP_layers.

Fox Image
Default: lr=1e-2, num_MLP_layers=4, PE_L = 10 Label
Tune 1: PE_L = 30, everything else same
Tune 2: num_MLP_layers = 10, everything else same
Statue of Liberty Image
Default: lr=1e-2, num_MLP_layers=4, PE_L = 20 Label
Tune 1: lr=1e-3, everything else same
Tune 2: PE_L = 5, everything else same

Part 2: Fit a Neural Radiance Field from Multi-view Images

In this part, we implement the actual 3d-NeRF. We will use the neural radiance field to represent a 3D space. We use the lego scene of (200, 200, 3) images and pre-processed camera locations. Here is an overview of everything I implemented.

Part 2 Results

Below I highlight the following:

Here, I visualize: 1) 100 rays (1 from each camera randomly chosen), cameras, as well as the sample points along the rays, and 2)100 randomly chosen rays and sample points for 1 camera view.

100 Rays/Cameras/Points Visualization

1 Camera Points Visualization

Training Progress

Train PSNR

Train PSNR

Validation PSNR

Validation PSNR

Video after 1 hour (2500 iterations) of training

GIF Iteration 2500

Video after 2 hours (5000 iterations) of training

GIF Iteration 5000

Bells and Whistles

By modifying the volrend function to take in an additional background_color tensor, we can render the above video with a different background color. Specifically, I add background_color, weighted by the "final" transmittance (found by multipylying all the T values), into the rendered_colors, and this makes only the background change color (since the transmittance for pixels of the Lego object would be very small). Below, I use the 5000 iteration checkpoint model to render the Lego video with Red, Green, and Blue backgrounds.

Red Background

Green Background

Blue Background