Joint Ampliude-Deformation Features for Image Recognition

Kshitij Bakliwal, ESSG Visiting Student 2019-20

Introduction

Pattern Recognition for Animal Biometrics is a valuable approach in the movement ecology — identifying and counting populations of endangered species — and conservation. In this approach, photographs of individual animals are matched with databases to retrieve previous sightings of the animal and this creates longitudinal records of movement. Image recognition methods are the basis for matching and the MIT Sloop program (http://sloop.mit.edu) has pioneered the development of this field.

Here, a Deep Learning approach that takes into account amplitude errors as well as feature deformations is demonstrated. In such animal datasets especially, with images taken in forest and wildlife areas, procurement of images conforming with each other in the areas of interest is extremely difficult. As a result, it becomes essential to identify the spatial differences and dissimilarities along with amplitude errors.

Our CNN model using amplitude errors alongwith weighted transport distances as inputs for each pair of images is able to perform quite well with a 90% recognition rate (approximate). After getting a 68% accuracy in matching images using the Field Alignment algorithm, our algorithm was designed so as to use both the weighted transport distances and amplitude errors and better the learning.

Applying a CNN learning model on the images itself (raw image matrices) resulted in a poor accuracy of just 56%. The model used was a Siamese Twin network with a Triplet loss function. Trying to apply local feature based methods failed completely with no or hardly any features getting detected using the standard MATLAB functions.

Dataset

Figure 1: The Gecko Dataset

The dataset contains a complicated mix of images, with varying brightness and phase differences. Upon trying different methods to extract important features, local feature-based extraction failed completely in obtaining useful information from the image.

Methods

1. Siamese Neural Network with Triplet Loss Function

The first model was a Siamese (Similarity-based) Neural Network with a Triplet Loss function. The objective of such a NN model was to distinguish between images of same and different individuals by distancing them accordingly. We use a Triplet Loss function for the Siamese NN which essentially trains the NN to distance a positive pair (images of same individual) closer to each other (less distant) than a negative pair (images of different individuals). We use transfer learning and employ AlexNet as the base network for our Siamese NN and we re-train the final 3 layers. Using this model, we only get a Test accuracy of 56%.

Figure 2: Siamese Neural Network

Figure 3: Learning using Triplet Loss

2. Field Alignment Algorithm

To get a better measure of the pattern similarity between 2 images, we use the Field Alignment algorithm, which is a part of the Field Alignment System and Testbed. The algorithm performs a deformation invariant image matching for a pair of images. Such a matching returns the displacement vectors in both dimensions and these vectors can be used as a measure of similarity between 2 images. Here, we use the transport distances obtained from the Field Alignment algorithm and infer the weighted divergence between these vectors to be a good metric of the dissimilarity between 2 images in our case. The weighted divergence value obtained is non-symmetric, i. e., the value for transformation of image A to image B is different from the value for transformation of image B to image A. We take the minimum of these 2 values as the dissimilarity measure for any pair of images (A, B).

Figure 4: Image Matching using Field Alignment

In the plot above, the yellow points represent a match. Since the images are in a sequential manner, different images of the same individual will be grouped together consecutively. So, ideally, we should obtain a diagonal of yellow points (although not taking an image as a best match for itself). Using this metric and determining the best match for an image using a nearest-neighbor like method, we find a 68% accuracy in matching an image to another of the same individual (out of our dataset of 251 images).

3. Convolutional Neural Network with Field Alignment Algorithm

Upon achieving a very good accuracy using the Field Alignment algorithm (68%), without any learning, the next step was to use these displacement vectors as inputs to a Deep Learning model. To use both amplitude-based and position-based deformations, we take the weighted divergence of the displacement vectors as one input feature, and the absolute difference of images as the other amplitude-based feature. The Neural Network contains 3 Convolutional layers with Re-Lu non-linearity. We generate paired data for the Gecko dataset and use 60% for training, 20% for validation and remaining 20% for testing. For each image, we take a matching pair and a non-matching pair (total 502 data points (about 300 for training)). The matching accuracy using the NN is approximately 90%.

Figure 5: CNN with Amplitude-Position Deformations (Training)

Results and Inference

CNN only – 56% Test Accuracy

Deformations only – 68% Test Accuracy

CNN with Deformations and Amplitude Errors – 90% Test Accuracy

Deformation Features such as weighted transport distances, when used in conjunction with ampltiude errors, can significantly improve upon amplitude-only based methods. A Deep Learning approach is still needed and should be primed properly to obtain high accuracy.

Supervised by Dr. Sai Ravela