Detecting Diabetic Retinopathy to Stop Blindness using Pretrained Deep Learning Models
Posted on Thu 26 September 2019 in posts • 20 min read
Introduction¶
Millions of people suffer from diabetic retinopathy, the leading cause of blindness among working aged adults. Aravind Eye Hospital in India hopes to detect and prevent this disease among people living in rural areas where medical screening is difficult to conduct.
Currently, Aravind technicians travel to these rural areas to capture images and then rely on highly trained doctors to review the images and provide diagnosis. Their goal is to scale their efforts through technology; to gain the ability to automatically screen images for disease and provide information on how severe the condition may be.
In this project, we will try to build a deep learning model to classify thousands of eye images from Aravind Eye Hospital based on the severity of the diabetic retinopathy.
This project is inspired by APTOS 2019 Blindness Detection competition on Kaggle.
For reference, you can find the jupyter notebook in my Github repo, any feedback is appreciated!
Setting up the Training Data¶
We will download the dataset from Kaggle, for further information about using Kaggle API, check the Kaggle API's repository
After we athenticated our request, we can now download it:
!kaggle competitions download -c aptos2019-blindness-detection
!ls
!unzip aptos2019-blindness-detection.zip
!unzip train_images.zip
Reading the Train Set¶
let's start by reading the train set:
import tensorflow as tf
tf.__version__
import pandas as pd
import numpy as np
SEED = 42
np.random.seed(SEED)
train=pd.read_csv("/content/train.csv")
train.head()
len(train)
Based on the data description provided on Kaggle, we are provided with a large set of retina images taken using fundus photography under a variety of imaging conditions.
A clinician has rated each image for the severity of diabetic retinopathy on a scale of 0 to 4:
0 - No DR
1 - Mild
2 - Moderate
3 - Severe
4 - Proliferative DR
Like any real-world data set, we will encounter noise in both the images and labels. Images may contain artifacts, be out of focus, underexposed, or overexposed. The images were gathered from multiple clinics using a variety of cameras over an extended period of time, which will introduce further variation. In the upcoming steps we will try to preprocess the images to better highlight some important features.
Let's read the test data into a Pandas dataframe:
test=pd.read_csv("/content/test.csv")
Next, let's create a dictionnary that divides the images ids into 4 classes based on the diagnosis:
dict={0:"No DR",
1:"Mild",
2:"Moderate",
3:"Severe",
4:"Profilerative DR"}
dict
diags={}
for k in dict.keys():
diags[k]=train[train["diagnosis"]==k]
Let's now check the distribution of each class in the train set:
import matplotlib.pyplot as plt
%matplotlib inline
cases=train["diagnosis"].value_counts(normalize=True,ascending=False)*100
ax=cases.plot(kind="bar")
ax.set_xticklabels(cases.index,rotation=0)
ax.set_title("Diagnosis Classes Distribution")
for i, v in enumerate(cases.values):
ax.text(i-0.2, v+0.5, str(round(v,3)))
for sp in ax.spines.keys():
ax.spines[sp].set_visible(False)
ax.set_yticks([])
plt.show()
We notice that the classes are not uniformly distributed; we notice that around 50% of the train data is classified as part of class 0, and around 27.28% classified as part of class 1. These 2 classes only accouts for more that 75% of the training data, which will inrease the risks that the trained model to be largely affected by this bias in data.
One solution is to calculate a weight for each class that will balance out the skew in the data and even out the distribution, which we will feed later on to the model as a seperate parameter.
weights=1./(train["diagnosis"].value_counts(normalize=True))
weights=weights/weights[0]
weights
Exploratory Data Analysis¶
Let's examine some random images of each class:
import cv2
from google.colab.patches import cv2_imshow
fig=plt.figure(figsize=(15,25))
for i,k in enumerate(diags.keys()):
for j,v in enumerate(np.random.choice(diags[k]["id_code"],size=5)):
ax=fig.add_subplot(5,5,5*(i) +(j+1),xticks=[], yticks=[])
image=cv2.imread(v+".png",cv2.COLOR_BGR2RGB)
image=cv2.resize(image,(150,150))
plt.imshow(image)
ax.set_title("label:"+v+"\n diagnosis:"+str(k))
plt.show()
Preparing the Training Folders Architecture¶
First, let's start by creating the train, validation and test folders in the home directory:
!mkdir -p data/test
!mkdir -p data/validation
!mkdir -p data/train
!pwd
#set directories variables
import os, shutil
home_dir = '/content'
base_dir = os.path.join(home_dir, 'data')
base_dir
train_dir=os.path.join(base_dir,"train")
print("train directory: "+train_dir)
test_dir=os.path.join(base_dir,"test")
print("test directory: "+test_dir)
validation_dir=os.path.join(base_dir,"validation")
print("validation directory: "+validation_dir)
train_0_dir = os.path.join(base_dir, 'train', '0')
train_1_dir = os.path.join(base_dir, 'train', '1')
train_2_dir = os.path.join(base_dir, 'train', '2')
train_3_dir = os.path.join(base_dir, 'train', '3')
train_4_dir = os.path.join(base_dir, 'train', '4')
test_0_dir = os.path.join(base_dir, 'test', '0')
test_1_dir = os.path.join(base_dir, 'test', '1')
test_2_dir = os.path.join(base_dir, 'test', '2')
test_3_dir = os.path.join(base_dir, 'test', '3')
test_4_dir = os.path.join(base_dir, 'test', '4')
validation_0_dir = os.path.join(base_dir, 'validation', '0')
validation_1_dir = os.path.join(base_dir, 'validation', '1')
validation_2_dir = os.path.join(base_dir, 'validation', '2')
validation_3_dir = os.path.join(base_dir, 'validation', '3')
validation_4_dir = os.path.join(base_dir, 'validation', '4')
Next, we will append the extension ".png" to all files names
for df in diags.values():
df["id_code_png"]=df["id_code"]+".png"
Next, we will group all entries belonging to the same classes into distinct folders to prepare data for ImageDataGenerator Class training:
from glob import glob
from tqdm import tqdm
for i in dict.keys():
n_digits = int(np.log10(len(diags[i]))) + 1
base_name = str(i)+"{}.png"
for k, fname in tqdm(enumerate(diags[i]["id_code_png"])):
new_name = base_name.format(str(k).zfill(n_digits))
os.rename(fname, new_name)
We will split the training data into train set (85%) and validation set (15%):
import numpy as np
from glob import glob
import os
from tqdm import tqdm
from sklearn.model_selection import train_test_split
filenames=[]
zeros=glob('0???.png')+glob('0????.png')
for k in dict.keys():
filenames = glob(str(k)+'???.png')+filenames
filenames=zeros+filenames
train_names , val_test_names = train_test_split(filenames, test_size=0.15)
for valname in val_test_names:
os.rename(valname, 'data/validation/' + valname)
#for testname in test_names:
# os.rename(testname, 'data/test/' + testname)
for trainname in train_names:
os.rename(trainname, 'data/train/' + trainname)
Now, we move the pictures into their corresponding folders:
!mkdir -p data/train/0/
!mv data/train/0*.png data/train/0/
!mkdir -p data/train/1/
!mv data/train/1*.png data/train/1/
!mkdir -p data/train/2/
!mv data/train/2*.png data/train/2/
!mkdir -p data/train/3/
!mv data/train/3*.png data/train/3/
!mkdir -p data/train/4/
!mv data/train/4*.png data/train/4/
!mkdir -p data/validation/0/
!mv data/validation/0*.png data/validation/0/
!mkdir -p data/validation/1/
!mv data/validation/1*.png data/validation/1/
!mkdir -p data/validation/2/
!mv data/validation/2*.png data/validation/2/
!mkdir -p data/validation/3/
!mv data/validation/3*.png data/validation/3/
!mkdir -p data/validation/4/
!mv data/validation/4*.png data/validation/4/
Preparing the Deep Learning Model¶
We will start by using a pretrained model whose layers are frozen except the top layer. We will chose the VGG16 model as starting point and eventually try other models to find the one with the highest accuracy.
Let's start by importing the model without its top layer:
from tensorflow.keras.applications import VGG16
conv_base=VGG16(weights="imagenet",include_top=False,input_shape=(150,150,3))
conv_base.summary()
Next, let's add a custom output layer:
from tensorflow.keras import layers
model = tf.keras.Sequential()
model.add(conv_base)
model.add(layers.Flatten())
model.add(layers.Dense(256, activation='relu'))
model.add(layers.Dense(5, activation='softmax',name="output"))
model.summary()
To not retrain the whole model again, we will freeze all the layers of the model except the custom output layers:
print('This is the number of trainable weights '
'before freezing the conv base:', len(model.trainable_weights))
for layer in conv_base.layers:
layer.trainable=False
conv_base.trainable = False
print('This is the number of trainable weights '
'after freezing the conv base:', len(model.trainable_weights))
Defining Callbacks¶
To prevent the model from overfitting the training data, we will define an EarlyStopper callback, whose role will be stop the model from training when its validation accuracy starts to decrease, which is usually the sign of overfitting. We will define as well a ReduceLROnPlateau callback, that will decrease the learning rate of the model when reaching a plateau in validation accuracy.
from tensorflow.keras.callbacks import ModelCheckpoint, EarlyStopping, ReduceLROnPlateau
early_stopper = EarlyStopping(monitor='val_acc', min_delta=0.0001, patience=5, verbose=1, mode='auto')
reduce_lr = ReduceLROnPlateau(monitor='val_acc', min_delta=0.0004, patience=2, factor=0.1, min_lr=1e-6, mode='auto', verbose=1)
Training the Model¶
One way to reduce overfitting is to use ImageDataGenerator; in each epoch, the model will be exposed to the training data, but each image is augmented randomly based on the parameters set. This strategy will ensure that the model will be more robust face to new images.
Once we defined the ImageDataGenerator object, we will train the model using the fit_generator method:
from tensorflow.keras.preprocessing.image import ImageDataGenerator
from tensorflow.keras import optimizers
train_datagen = ImageDataGenerator(
horizontal_flip=True,
vertical_flip=True,
rotation_range=40,
zoom_range=0.2,
shear_range=0.1,
fill_mode='nearest')
# Note that the validation data should not be augmented!
test_datagen = ImageDataGenerator()
train_generator = train_datagen.flow_from_directory(
# This is the target directory
train_dir,
# All images will be resized to 150x150
target_size=(150, 150),
batch_size=20,
# Since we use binary_crossentropy loss, we need binary labels
class_mode='categorical')
validation_generator = test_datagen.flow_from_directory(
validation_dir,
target_size=(150, 150),
batch_size=20,
class_mode='categorical')
model.compile(loss='categorical_crossentropy',
optimizer=optimizers.RMSprop(lr=2e-4),
metrics=['acc'])
history = model.fit_generator(
train_generator,
steps_per_epoch=100,
epochs=50,
validation_data=validation_generator,
validation_steps=50,class_weight=weights,callbacks=[early_stopper,reduce_lr])
Next, we will save the model and its weights for easy retrieval later on:
model.save("model_1.h5")
Plotting Accuracy and Error¶
def plot_acc_loss(history,model_name):
import matplotlib.pyplot as plt
acc = history.history['acc']
val_acc = history.history['val_acc']
loss = history.history['loss']
val_loss = history.history['val_loss']
epochs = range(len(acc))
plt.plot(epochs, acc, 'r', label='Training acc')
plt.plot(epochs, val_acc, 'b', label='Validation acc')
plt.title('Training and validation accuracy')
plt.legend()
plt.savefig(model_name+"acc.png")
plt.figure()
plt.plot(epochs, loss, 'r', label='Training loss')
plt.plot(epochs, val_loss, 'b', label='Validation loss')
plt.title('Training and validation loss')
plt.legend()
plt.savefig(model_name+"loss.png")
plt.show()
plot_acc_loss(history,"model_1")
As we notice from the previous plots, the maximum accuracy obtained on the validation set was around 74%, but we can do better. Next, we will try to fine-tune the model to try to increase its accuracy.
Improving the Model Accuracy by Adding Dropout and Batch Normalizaton¶
One approach to improve the accuracy and reduce overfitting is by adding a Dropout layer and a Batch Normalization layer to the dense layers:
from tensorflow.keras import layers
model = tf.keras.Sequential()
model.add(conv_base)
model.add(layers.Flatten())
model.add(layers.Dense(256, activation='relu'))
model.add(layers.Dropout(0.2))
model.add(layers.BatchNormalization())
model.add(layers.Dense(5, activation='softmax'))
model.summary()
Next, we will freeze all the layers except the top layer:
print('This is the number of trainable weights '
'before freezing the conv base:', len(model.trainable_weights))
conv_base.trainable = False
for layer in conv_base.layers:
layer.trainable=False
print('This is the number of trainable weights '
'after freezing the conv base:', len(model.trainable_weights))
Now we will train the model:
from tensorflow.keras.preprocessing.image import ImageDataGenerator
from tensorflow.keras import optimizers
train_datagen = ImageDataGenerator(
horizontal_flip=True,
vertical_flip=True,
rotation_range=40,
zoom_range=0.2,
shear_range=0.1,
fill_mode='nearest')
# Note that the validation data should not be augmented!
test_datagen = ImageDataGenerator()
train_generator = train_datagen.flow_from_directory(
# This is the target directory
train_dir,
# All images will be resized to 150x150
target_size=(150, 150),
batch_size=20,
# Since we use categorical_crossentropy loss, we need categorical labels
class_mode='categorical')
validation_generator = test_datagen.flow_from_directory(
validation_dir,
target_size=(150, 150),
batch_size=20,
class_mode='categorical')
model.compile(loss='categorical_crossentropy',
optimizer=optimizers.RMSprop(lr=2e-4),
metrics=['acc'])
history = model.fit_generator(
train_generator,
steps_per_epoch=100,
epochs=50,
validation_data=validation_generator,
validation_steps=50,class_weight=weights,callbacks=[early_stopper,reduce_lr])
model.save("model_2.h5")
plot_acc_loss(history,"model_2")
As we notice from the previous plots, the maximum accuracy obtained on the validation set increased to reach around 75.56%, but we can do better. Next, we will try another approach to further improve the model.
NB_VALID_STEPS = validation_generator.n // validation_generator.batch_size
NB_VALID_STEPS
Unfreezing the Last Layers¶
Another approach is to unfreeze the last layers and retrain the model again. Let's repeat the process and unfreeze the last layer:
conv_base.summary()
We will unfreeze the block5_conv1 layer:
conv_base.trainable = True
set_trainable = False
for layer in conv_base.layers:
if layer.name == 'block5_conv1':
set_trainable = True
if set_trainable:
layer.trainable = True
else:
layer.trainable = False
model.compile(loss='categorical_crossentropy',
optimizer=optimizers.RMSprop(lr=1e-5),
metrics=['acc'])
history = model.fit_generator(
train_generator,
steps_per_epoch=100,
epochs=100,
validation_data=validation_generator,
validation_steps=50,callbacks=[early_stopper,reduce_lr],class_weight=weights)
model.save("model_3.h5")
Let's visualize the variation of the accuracy and loss in the training and validation set:
plot_acc_loss(history,"model_3")
Let's evaluate the accuracy of the model:
(eval_loss, eval_accuracy) = tqdm(
model.evaluate_generator(generator=validation_generator, steps=NB_VALID_STEPS))
print("\nAccuracy: {:.2f}%".format(eval_accuracy * 100))
print("Loss: {}".format(eval_loss))
Evaluating the Model Accuracy¶
Let's find the confusion matrix, which will help us pinpoint the weaknesses in our model:
train["id_code_png"]=train["id_code"]+".png"
train.head()
Let's get the predicted values for the validation set's images:
pred_generator = test_datagen.flow_from_directory(
validation_dir,
target_size=(150, 150),
batch_size=1,
class_mode='categorical',
shuffle=False)
valid_preds=model.predict_generator(generator=pred_generator,steps=len(val_test_names))
valid_preds
Now, we will get the predicted classes with the highest probabilities for each instance into a seperate array:
preds=np.argmax(valid_preds,axis=1)
Next, let's sort the predicted labels:
sorted_val_test_names=sorted(val_test_names)
labels=np.zeros(len(val_test_names))
for i in range(len(val_test_names)):
labels[i]=int(sorted_val_test_names[i][0])
labels
vals,counts=np.unique(labels,return_counts=True)
print("Unique Values : " ,vals)
print("Occurrence Count : ", counts)
Now, we are ready to find the confusion matrix:
from sklearn.metrics import confusion_matrix
conf_mat=confusion_matrix(labels,preds)
conf_mat
Using the obtained matrix, let's compute the percentage of error for each class:
perc_mat=(conf_mat/counts[:,None])
perc_mat
Let's plot the percentages matrix:
np.fill_diagonal(perc_mat, 0)
plt.matshow(perc_mat, cmap=plt.cm.gray)
plt.savefig("confusion_matrix_errors_plot", tight_layout=False)
plt.show()
Let's calculate the f1 score for this model:
from sklearn.metrics import f1_score
f1=f1_score(labels,preds,average="weighted")
f1
Trying ResNet model¶
To try to improve the accuracy of the model, let's try another model - Resnet50. Using this model, let's repeat the process we used in the previous model:
from tensorflow.keras.applications import ResNet50
res_base=ResNet50(include_top = False, pooling = "avg", weights = "imagenet",input_shape=(224,224,3))
res_model = tf.keras.Sequential()
res_model.add(res_base)
res_model.add(layers.Dropout(0.3))
res_model.add(layers.Dense(1024, activation='relu'))
res_model.add(layers.Dropout(0.2))
res_model.add(layers.Dense(512, activation='relu'))
res_model.add(layers.Dropout(0.2))
res_model.add(layers.BatchNormalization())
res_model.add(layers.Dense(5, activation='softmax'))
res_model.summary()
print('This is the number of trainable weights '
'before freezing the res base:', len(res_model.trainable_weights))
for layer in res_base.layers:
layer.trainable=False
res_base.trainable = False
print('This is the number of trainable weights '
'after freezing the res base:', len(res_model.trainable_weights))
train_datagen = ImageDataGenerator(
horizontal_flip=True,
vertical_flip=True,
rotation_range=360,
zoom_range=0.2,
shear_range=0.1,
fill_mode='nearest')
# Note that the validation data should not be augmented!
test_datagen = ImageDataGenerator()
train_generator = train_datagen.flow_from_directory(
# This is the target directory
train_dir,
# All images will be resized to 150x150
target_size=(224, 224),
batch_size=12,
# Since we use binary_crossentropy loss, we need binary labels
class_mode='categorical')
validation_generator = test_datagen.flow_from_directory(
validation_dir,
target_size=(224, 224),
batch_size=12,
class_mode='categorical')
res_model.compile(loss='categorical_crossentropy',
optimizer=optimizers.SGD(lr=1e-3),
metrics=['acc'])
res_history = res_model.fit_generator(
train_generator,
steps_per_epoch=100,
epochs=50,
validation_data=validation_generator,
validation_steps=50,class_weight=weights,callbacks=[early_stopper,reduce_lr])
plot_acc_loss(res_history,"res_model")
We notice that the accuracy of the model did not improve at all using this model, instead it decreased. It seems that ResNet50 is not very fit for this kind of pictures.
Another Approach: Model Training Based on Extracted Features¶
The last approach we will be tackling is using a pretrained model with the latest layer removed, to extract the features of the pictures, which we will later on feed into a machine learning model that will try to predict the severity of the disease.
We will be using MobileNetV2 model, one of the models with the highest accuracy:
from tensorflow.keras.applications import MobileNetV2
inc_base=MobileNetV2(include_top=False, weights='imagenet', input_shape=(224,224,3))
Let's extract the features from the training and validation sets:
import os
import numpy as np
from tensorflow.keras.preprocessing.image import ImageDataGenerator
datagen = ImageDataGenerator(rescale=1./255)
batch_size = 20
def extract_features(directory, sample_count):
features = np.zeros(shape=(sample_count, 7,7,1280))
labels = np.zeros(shape=(sample_count))
generator = datagen.flow_from_directory(
directory,
target_size=(224, 224),
batch_size=batch_size,
class_mode='categorical')
i = 0
for inputs_batch, labels_batch in generator:
features_batch = inc_base.predict(inputs_batch)
features[i * batch_size : (i + 1) * batch_size] = features_batch
labels[i * batch_size : (i + 1) * batch_size] = np.argmax(labels_batch,axis=1)
i += 1
if i * batch_size >= sample_count:
# Note that since generators yield data indefinitely in a loop,
# we must `break` after every image has been seen once.
break
return features, labels
train_samples = 3112
validation_samples = 550
train_features, train_labels = extract_features(train_dir, train_samples)
validation_features, validation_labels = extract_features(validation_dir, validation_samples)
train_features = np.reshape(train_features, (train_samples, 7 * 7 * 1280))
validation_features = np.reshape(validation_features, (validation_samples, 7 * 7 * 1280))
from tensorflow.keras.utils import to_categorical
train_labels_cat = to_categorical(train_labels)
validation_labels_cat=to_categorical(validation_labels)
Training a Neural Network Using Extracted Features¶
Now we have the extracted features from the train and validation sets ready, we will first train a neural netwrork on the training set features:
inc_model = tf.keras.Sequential()
inc_model.add(layers.Dense(256, activation='relu', input_dim=7 * 7 * 1280))
inc_model.add(layers.Dropout(0.5))
inc_model.add(layers.BatchNormalization())
inc_model.add(layers.Dense(5, activation='softmax'))
inc_model.compile(optimizer=optimizers.RMSprop(lr=2e-5),
loss='categorical_crossentropy',
metrics=['acc'])
history = inc_model.fit(train_features, train_labels_cat,
epochs=30,
batch_size=20,
validation_data=(validation_features, validation_labels_cat))
Let's plot the accuracy and error for the training and validation sets:
plot_acc_loss(history,"inc_model")
(eval_loss, eval_accuracy) = tqdm(
inc_model.evaluate(validation_features, validation_labels_cat))
print("\nAccuracy: {:.2f}%".format(eval_accuracy * 100))
print("Loss: {}".format(eval_loss))
Now we notice that the accuracy of the model has increased to around 81%!!
Finally, let's generate a classification report to evaluate the performance of the model:
print(classification_report(validation_labels, pred_valid_labels))
We will stop at this stage, but there is a lot of room for improvement. Further steps to improve the model might be:
- trying other algorithms and tuning their hyperparameters
- preprocessing the images to better expose the most important features
- trying other pretrained models
You reached the end of the project! Thank you for following till the end, and stay tuned for more projects!