If we see something flying in the sky, we can effortlessly determine whether it is an airplane or a bird. However, this is not applicable to computers. For computers to understand such distinctions, images need to be systematically and numerically represented. This is where feature vectors come into play.
Feature vectors are numerical structured representations of images that capture their visual characteristics. It is an important component of machine learning. It allows us to process machine learning algorithms based on the features of the image.
Feature vector has multiple dimensions. Each dimension may represent different features of the image. These vectors are extracted using deep learning techniques or other feature extraction methods. Using a pre-trained deep learning model is very effective in image feature extraction. We are going to use a pre-trained deep learning method to find the most similar 10 images to a given image. I have chosen to use the pre-trained ResNet50 model for this project.
What is Resnet50?
ResNet50 is a deep convolutional neural network (CNN) architecture which was introduced in 2015 by researchers Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. CNN processes the image by passing it through various layers. “50” in ResNet50 refers to the number of weight layers in the network.
One of the key points about ResNet50 is residual blocks (also known as skip connections). These blocks were introduced to address the vanishing gradients problem, a common issue in training very deep neural networks. When neural networks have a large number of layers, the gradients can become very small during backpropagation. Thanks to the effectiveness of residual blocks for solving this problem, it is easier for the network to learn deeper representations.
Here is a detailed explanation of ResNet50:
the-annotated-resnet-50-a6c536034758
Now, in this article we are going to find 10 most similar images to the given image by using a pre-trained ResNet50 model.
1. Setting up the environment
for image processing we used cv2 and for image visualization we used matplotlib.pyplot (‘plt’). Also, for array processing we imported numpy.
TensorFlow is an open-source machine learning library. From this library we used Resnet50 so we imported it. preprocess_ input module is imported to give suitable input to the model. Also we import other necessary functions from the TensorFlow library. For the comparison of two feature vectors, we imported cosine_similarity.
Finally, we set up a ResNet50 model with pre-trained weights and prepared it for feature extraction.
- weights = “imagenet” means the model’s weights are initialized using pre-trained weights from the ImageNet dataset.
- inlude_top means the final fully connected layers (top layers) responsible for classification are not included in the model. This is because we want to extract feature vectors.
- input_shape=(224,224,3) means the model expects input with dimensions of 224x224 pixels in RGB format.
2. Defining function for extracting feature vector
After loading the image with load_img function, we preprocessed the image to fit in the model. After that with the function model.predict(), we extract the feature vector of the image. We flatten the 4D vector to 1D because it is easier to compare the feature vectors in 1D format.
3. Collecting the similarities of the images
We need a dataset to find the similar image. folder_path is the path to the folder containing the images we want to compare to a given image and the given_image_path is the path to the image for which we want to find similar images.
We extracted the feature vector of the given image with the extract_feature_vector function. After that we define an empty dictionary to store the similarity scores between the given image and other images in the folder.
In the for loop, we compare the feature vectors with cosine_similarity and store them in the dictionary with the filenames as the key. Cosine similarity measures the cosine of the angle between two vectors and ranges -1 to 1.
After computing all the similarity scores, we sorted the dictionary according to similarity scores and stored the 10 most similar images to the list.
5. Displaying the results
Finally, we displayed the 10 similar images with their similarity scores. For the example output I choose an image from the dataset, so the first image in the example output is the given image.
Here is example outputs:
Bahar Güneş