Images of the Russian Empire:

Colorizing the Prokudin-Gorskii Photo Collection

Yueheng Zeng @ Project 1

Overview

The goal of this project is to colorize the Prokudin-Gorskii photo collection. The collection is a series of photographs taken by Russian photographer Sergei Mikhailovich Prokudin-Gorskii in the early 20th century. The photographs are in black and white, but they were taken using a special camera that captured three separate black and white images, each with a different color filter (red, green, or blue). By combining these three images, we can reconstruct the original color image.

The image on the left is the original black and white image, and the image on the right is the colorized image produced by this project.

Aligning the Low-resolution Images
Aligning the High-resolution Images
Automatic Cropping
Automatic Contrasting
Better Color Mapping
All Results

Aligning the Low-resolution Images

The alignment process for low-resolution images involves loading the images, splitting them into separate channels, detecting edges, and aligning the channels to create colorized images. The process is as follows:

Loading and Splitting Channels: Load the image and split it into three separate channels—blue (B), green (G), and red (R)—from a single image where these channels are stacked vertically. This is achieved by dividing the height of the image by three and slicing it into equal parts for each channel.
Edge Detection: Use the Canny edge detector to detect edges in the images. This technique helps focus on key features of the image (e.g., edges and transitions) rather than pixel intensity values, which can be less effective for alignment due to variations in brightness.
Alignment Process:
1. The alignment of the green (G) and red (R) channels to the blue (B) channel is performed using a brute-force search over a specified range of pixel displacements (I used a range of -15 to 15 pixels) along both the horizontal and vertical axes.
2. A cropping step is included in the alignment to avoid border artifacts. I cut off 10% of the image from the edges before calculating the distance.
3. For each possible combination of vertical and horizontal displacements, the function shifts the green or red channel using np.roll and calculates the squared Euclidean distance between the shifted channel and the reference channel (the blue channel).

The image above is the cathedral image before and after alignment.

Aligning the High-resolution Images

The alignment process for high-resolution images is similar to the process for low-resolution images, but it will take significantly longer to run due to the increased image size. To speed up the process, it is common to use a pyramid approach, where the image is progressively downsampled to create a series of smaller images that are easier to align. The process is as follows:

Create Pyramid: Generate progressively smaller versions of the image at each level by downscaling the original channel. I down scaled the image by a factor of 2 for each level and I want the smallest image have a height around 400 pixels.
Recursive Alignment:
1. Start aligning at the lowest resolution (coarsest level).
2. Recursively align each higher level by first aligning the next coarser level.
3. Scale up the offset from the lower level and apply it to the current level before refining the alignment.
Refine Alignment: At each level, combine the new alignment offset with the scaled offset from the previous level to progressively improve accuracy as the resolution increases.

The image above is the lady image before and after alignment.

Automatic Cropping

The automatic cropping process is used to remove the weird border artifacts that appear in the aligned images. The process is as follows:

Rescale Image: If the image height is larger than 500 pixels, it is downscaled; otherwise, it remains at the same resolution.
Crop Central Region: A central 60% region of the image is extracted both horizontally and vertically for line detection.
Line Detection: Vertical (left, right) and horizontal (top, bottom) lines are detected in each color channel using Canny edge detection and Hough transforms.
Determine Innermost Lines: The coordinates of the innermost lines (left, right, top, bottom) are determined.
Adjust Crop: A small margin (1% of the image) is removed from each side to avoid edge artifacts.
Crop Image: The image is cropped using the coordinates of the innermost lines.

The image above is the lady image before and after cropping.

Automatic Contrasting

The automatic contrasting process is used to enhance the contrast of the images. The process is as follows:

Crop the Image: The image is cropped to avoid border artifacts.
Determine Min and Max Values: The minimum and maximum pixel values within the cropped region are calculated. These values represent the darkest and brightest parts of the image within the selected region.
Scale Pixel Values: The pixel values in the entire image are linearly scaled so that the minimum value becomes 0 and the maximum value becomes 1. This improves contrast by stretching the image's intensity values across the full range. For those pixels that are outside the range, they are clipped to the minimum or maximum value.

The upper-left image is the lady image before automatic contrasting, and the upper-right image is the lady image after automatic contrasting.

Better Color Mapping

The better color mapping process is used to map the images to a more realistic color space. The process is as follows:

Convert to LAB Color Space: Convert the image from the RGB color space to the LAB color space. The LAB color space is designed to mimic human vision and is more perceptually uniform than the RGB color space.
Adjust the AB Channels: I wrote a program that creates a simple GUI using tkinter that allows real-time adjustment of the L (lightness), A (green-red), and B (blue-yellow) channels in the LAB color space of an image. Then I performed some experiments to find the best scaling factors for the A and B channels to achieve a more realistic color mapping.
Some results:
- emir: L: 1, A: 0.4, B: 0.95
- church: L: 1, A: 0.31, B: 0.91
- lady: L: 1, A: 0.36, B: 0.87
- three_generations: L: 1, A: 0.36, B: 0.95
- tobolsk: L: 1, A: 0.31, B: 0.95
Conclusion:
According to the experiments, it is reasonable to leave the L channel unchanged and scale the A and B channels by 0.35 and 0.93, respectively.
Convert Back to RGB: Convert the image back to the RGB color space for display.