Human Image Segmentation with Python — How to prepare an Image Data Set for Deep Learning

sokacoding
6 min readJan 23, 2023

--

Images are widely used in the field of deep learning. Image classification (cat or dog?), object detection and segmentation are such fields. In this article we will explore and prepare a data set for human image segmentation. That is the technique being used in video call apps like Microsoft Teams or Zoom where the person in the video feed is being segmented in order to apply some kind of filter to the background or to exchange the background with a new image.

Prerequisites and infos

  • Python 3.8, keras, opencv-python (I am using a conda environment)
  • Basic python knowledge
  • Basic knowledge about convolutional neural networks (CNNs)
  • Basic knowledge about the usage of opencv in python
  • CNNs for segmentation take an image as input and get an image mask (also referred to as alpha channel or matte) as target
[Louis Miguel] Left: Original Image (Input), Right: Corresponding Mask (Target)

Getting the data set

The data set we will be working with can be downloaded from:

It contains 34,424 unique images of persons and the corresponding masks. Its size is approximately 14 GB. So now that we’ve downloaded and extracted the data, let us have a closer look.

Investigating the data

To begin, we will examine the folder structure of the dataset to determine if there is a naming convention used by the creators. We find two top-level folders named clip_img and matting. The sub-folders of these top level folders contain some kind of numerical label and are identical. These sub-folders also contain sub-folders named either clip_xxxxxxxx or matting_xxxxxxxx with iterative numbering. For example, we have these two folders :

“..\segmentation_dataset\clip_img\1803151818\clip_00000000\”

“..\segmentation_dataset\matting\1803151818\matting_00000000\”

These are the lowest level folders containing the images with identical names. We can later use this information to map the input data (image) to the target data(mask). Also note that the images in the clip_img folder are jpg-files and the images in the matting folder are png-files. When opening one of the jpg-files we can see a normal picture of a person. But we don’t get to see a mask when opening one of the png-files. We rather see a cut out person. So is there really a mask? How can we make it visible?

A digital picture usually contains 3 channels called RGB (red, green, blue) or sometimes called BGR. A png-file can additionally contain a 4th channel called the alpha channel. We can extract the mask from that channel. By default the cv2.imread() -function reads in 3 channels even if there is a 4th one. We have to call the function with a specific flag by calling cv2.imread("some_image_with_mask.png", cv2.IMREAD_UNCHANGED).

Now we want to have a look at the mask. To accomplish that, we have to read the file that contains a mask as described and then use splicing to read the 4th channel only (remember: python uses zero-indexing, so the index 3 calls the 4th channel). The first two indices stand for the height and the width of the image. So our arrays have a HWC-format (height, width, channel).

image = cv2.imread("some_image_with_mask.png", cv2.IMREAD_UNCHANGED)
mask = image[:,:,3]
cv2.imshow("Mask Only", mask)
cv2.waitKey(0)
cv2.destroyAllWindows()

Cleaning the data set

When I first downloaded the data set there was a corrupted file in one of the folders and it gave me a hard time because I did not check the data properly at that time. So even if you have downloaded the clean data set from kaggle, let us corrupt the data set ourselves and find the self-made errors. We will add

  • One txt-file
  • One corrupted jpg-file

Choose a random folder in the data set and add a txt-file called dummy.txt. Create another txt-file called corrupted_image, click on “file” and then “save as…”. Select “all” for data types and change the ending of the filename from “.txt” to “.jpg”. Now we have a txt-file and an unreadable jpg-file hidden in the depths of our data set. We will find those evil-doers with the help of python.

import os
import cv2
root_folder = r"C:\your_folder_structure\segmentation_dataset\"

We need to import the modules os and cv2. Then setup your root folder containing the data set. Now we can go through all the files with os.walk().

for subdir, dirs, files in os.walk(root_folder):
for filename in files:
filepath = subdir + os.sep + filename

Now we could just check for the right file-endings.

if not filepath.endswith(".jpg") and not filepath.endswith("png"):
print ("Not an image-file at: {}".format(filepath))

By doing so, we get the following output depending on the folder where you put your txt-file.

Not an image-file at: 
C:\your_folder_structure\segmentation_dataset\clip_img\1803151818\clip_00000000\dummy.txt

Now we have a clean data set. Don’t we? Well, if we would be all set now and ready for training, we would still run into the corrupted image file. If we started a several hour long deep learning training now that would be pretty frustrating. So let us take a look at a better attempt. When reading a file with opencv (the cv2 module we imported) using the cv2.imread()-function we pass in a path as an argument. If the file can’t be read as an image the function returns a None-type object. Thus we can optimize our code like so.

if cv2.imread(filepath) is None:
print("\nNot an image file or corrupted file:")
print(filepath)

This results in the following output:

Not an image file or corrupted file:
C:\your_folder_structure\segmentation_dataset\clip_img\1803151818\clip_00000000\corrupted_image.jpg
Not an image file or corrupted file:
C:\your_folder_structure\segmentation_dataset\clip_img\1803151818\clip_00000000\dummy.txt

Note that reading every image file takes some time. Checking the string values only is a lot faster. But we’re trading off speed for safety. Finally, you could use os.remove(filepath) to delete these files. We might have deleted the corrupted files now, but we still are not good to go. We could have destroyed an input-target-pair of images. When it comes to segmentation, the input for the neural net is the original image and the target is the corresponding mask. If the corresponding mask is missing because we deleted it in our cleaning process, we will run into an error during training. Therefore, we want to delete both files. To accomplish that automatically we will work with the filepath strings in order to map the input to the target and vice versa. Let us check the difference in the file paths of an input-target-pair.

Original Image:

“..\clip_img\1803151818\clip_00000000\1803151818–00000023.jpg

Corresponding Mask:

“..\matting\1803151818\matting_00000000\1803151818–00000023.png

Let us assume the original image was a corrupted file and we deleted it, we then also need to delete the corresponding mask. Let us further assume the original image path is currently saved in filepath then we can make use of the .replace() -function for strings.

map_image_to_mask_path = filepath.replace("clip_img","matting").replace("clip","matting").replace(".jpg",".png")

This turns the file path of the original image to the file path of the corresponding mask. If it was the other way round and we need to map the mask file path to the image file path, it goes like this.

map_mask_to_image_path = path.replace("matting_","clip_").replace("matting","clip_img").replace(".png",".jpg")

Note that this only works this way if your folder structure doesn’t contain the strings “matting” or “clip”. A full code snippet that accomplishes every discussed step might look like this:

We can simulate the cleaning process of corrupted input-target-pairs.

  • Create the files A.jpg and A.png, where the jpg-file is corrupted and the png-file is valid
  • Create the files B.jpg and B.png, where the png-file is corrupted and the jpg-file is valid
  • Copy the jpg-files to “\clip_img\1803151818\clip_00000000\”
  • Copy the png-files to “\matting\1803151818\matting_00000000\”
  • Create another “dummy.txt”

Running the code for the first time (this will take a few minutes, remember the speed versus safety trade-off) now results in the following output:

Not an image file or corrupted file:
Deleted: C:\your_folder_structure\segmentation_dataset\clip_img\1803151818\clip_00000000\A.jpg
Corrupted file belonged to an input-target-pair.
Additionally deleted corresponding target: C:\your_folder_structure\segmentation_dataset\matting\1803151818\matting_00000000\A.png

Not an image file or corrupted file:
Deleted: C:\your_folder_structure\segmentation_dataset\clip_img\1803151818\clip_00000000\dummy.txt
Corrupted file didn't belong to an input-target-pair.

Not an image file or corrupted file:
Deleted: C:\your_folder_structure\segmentation_dataset\matting\1803151818\matting_00000000\B.png
Corrupted file belonged to an input-target-pair.
Additionally deleted corresponding input: C:\your_folder_structure\segmentation_dataset\clip_img\1803151818\clip_00000000\B.jpg

Running the code again results in:

No problems found!

Our data set is finally clean and prepared for use in the deep learning pipeline. In conclusion, preparing image data for deep learning is a crucial step in the model training process. Always thoroughly investigate the data you are working with! Hope you enjoyed this hands-on example!

--

--

sokacoding
sokacoding

Written by sokacoding

M.Sc. Media Informatics. Scientific Associate at Hochschule Düsseldorf - University of Applied Sciences.

No responses yet