Author: Leo Simmons
Compilation: ronghuaiyang
Introduction
An application that is very similar to face attribute prediction.
This article describes a neural network that predicts a person's BMI ([Body Mass Index]) through face images. This project borrows from another project: https://github.com/yu4u/age-gender-estimation method, classifies a person's age and gender through faces. This project includes the weight of a trained model and a script that uses a camera to dynamically detect the user's face. In addition to being an interesting machine learning problem, predicting BMI in this way can be a useful medical diagnostic tool.
training data
uses 4000 images, each of which is an image of a different individual, all taken from the front of the subject. The BMI for each training sample is calculated by the subject's height and weight (BMI is the weight (kg) divided by the square of height (meters). While training images cannot be shared here as they are used for another private project, this type of data can be collected from different places online.
Graphics Preprocessing
To normalize the images before training, each image was cropped to the subject's face, excluding the area around the face. Use the Python library dlib to detect the face of the subject in each image and add additional boundaries around the boundaries detected by the dlib to generate images for actual training. We experimented with a few margins to see which one could make the network perform best. We chose 20% margins, i.e. the image height and width are expanded by 40% (20% on each side) because it produces the best validation performance.
shows the image added to Bill Murray using different crop edges, and there is a table showing the smallest average absolute error that the model can achieve on the validation set with different margins added (MAE).
original image
images cropped with different Margin
The lowest MAE MAE
Although the MAE value in the margin range of 20%-50% may be too close to say that either one is better than the others, it is obvious that increasing margin by at least 20% will produce better MAE than without adding margin. This may be because the added margin captures features such as the upper forehead, ears, and neck that are useful for the model to predict BMI, but are mostly cut out by the original dlib.
image preprocessing code:
import osimport cv2import dlib from matplotlib import pyplot as pltimport numpy as npimport configml9detector = dlib.get_frontal_face_detector()def crop_faces():bad_crop_count = 0if not os.path.exists(config.CROPPED_IMGS_DIR):os.makedirs(config.CROPPED_IMGS_DIR)print 'Cropping faces and saving to %s' % config.CROPPED_IMGS_DIRgood_cropped_images = []good_cropped_img_file_names = []detected_cropped_images = []original_images_detected = []for file_name in sorted(os.listdir(config.ORIGINAL_IMGS_DIR)):np_img = cv2.imread(os.path.join(config.ORIGINAL_IMGS_DIR,file_name))detected = detector(np_img, 1)img_h, img_w, _ = np.shape(np_img)original_images_detected.append(np_img)if len(detected) != 1:bad_crop_count += 1continued = detected[0]x1, y1, x2, y2, w, h = d.left(), d.top(), d.right() + 1, d.bottom() + 1, d.width(), d.height()xw1 = int(x1 - config.MARGIN * w)yw1 = int(y1 - config.MARGIN * h)xw2 = int(x2 + config.MARGIN * w)yw2 = int(y2 + config.MARGIN * h)cropped_img = crop_image_to_dimensions(np_img, xw1, yw1, xw2, yw2)norm_file_path = '%s/%s' % (config.CROPPED_IMGS_DIR, file_name)cv2.imwrite(norm_file_path, cropped_img)good_cropped_img_file_names.append(file_name)# save info of good cropped images with open(config.ORIGINAL_IMGS_INFO_FILE, 'r') as f:column_headers = f.read().splitlines()[0]all_imgs_info = f.read().splitlines()[1:]cropped_imgs_info = [l for l in all_imgs_info if l.split(',')[-1] in good_cropped_img_file_names]with open(config.CROPPED_IMGS_INFO_FILE, 'w') as f:f.write('%s\n' % column_headers) for l in cropped_imgs_info:f.write('%s\n' % l)print 'Cropped %d images and saved in %s - info in %s' % (len(original_images_detected), config.CROPPED_IMGS_DIR, config.CROPPED_IMGS_INFO_FILE)print 'Error detecting face in %d images - info in Data/unnormalized.txt' % bad_crop_countreturn good_cropped_images# image cropping function taken from:# https://stackoverflow.com/questions/15589517/how-to-crop-an-image-in-opencv-using-pythondef crop_image_to_dimensions(img, x1, y1, x2, y2):if x1 0 or y1 0 or x2 img.shape[1] or y2 img.shape[0]:img, x1, x2, y1, y2 = pad_img_to_fit_bbox(img, x1, x2, y1, y2)return img[y1:y2, x1:x2, :]def pad_img_to_fit_bbox(img, x1, x2, y1, y2, y1, y2):img = cv2.copyMakeBorder(img, - min(0, y1), max(y2 - img.shape[0], 0), -min(0, x1), max(x2 - img.shape[1], 0), cv2.BORDER_REPLICATE)y2 += -min(0, y1)y1 += -min(0, y1)x2 += -min(0, x1)x1 += -min(0, x1) return img, x1, x2, y1, y2if __name__ == '__main__':crop_faces()
Image enhancement
To increase the number of times each original training image is used for network training, the images are enhanced in each training epoch. Image enhancement library Augmentor is used to dynamically rotate, flip and distort the resolution of different parts of an image and change the contrast and brightness of the image.
no enhancement
random enhancement
Image enhancement code:
from kerash.preprocessing.image import ImageDataGeneratorimport pandas as pdimport Augmentorfrom PIL import Imageimport randomimport numpy as npiport matplotlib.pyplot as pltimport mathimport configdef plot_imgs_from_generator(generator, number_imgs_to_show=9):print ('Plotting images...')n_rows_cols = int(math.ceil(math.sqrt(number_imgs_to_show)))plot_index = 1x_batch, _ = next(generator)while plot_index = number_imgs_to_show:plt.subplot(n_rows_cols, n_rows_cols, plot_index)plt.imshow(x_batch[plot_index-1])plot_index += 1plt.show()def augment_image(np_img):p = Augmentor.Pipeline()p.rotate(probability=1, max_left_rotation=5, max_right_rotation=5)p.flip_left_right(probability=0.5)p.random_distortion(probability=0.25, grid_width=2, grid_height=2, magnitude=8)p.random_color(probability=1, min_factor=0.8, max_factor=1.2)p.random_contrast(probability=.5, min_factor=0.8, max_factor=1.2)p.random_brightness(probability=1, min_factor=0.5, max_factor=1.5)image = [Image.fromarray(np_img.astype('uint8'))] for operation in p.operations:r = round(random.uniform(0, 1), 1)if r = operation.probability:image = operation.perform_operation(image)image = [np.array(i).astype('float64') for i in image]return image[0]image_processor = ImageDataGenerator(rescale=1./255,preprocessing_function=augment_image)# subtract validation size from training datawith open(config.CROPPED_IMGS_INFO_FILE) as f:for i, _ in enumerate(f):passtraining_n = i - config.VALIDATION_SIZEtrain_df=pd.read_csv(config.CROPPED_IMGS_INFO_FILE, nrows=training_n)train_generator=image_processor.flow_from_dataframe(dataframe=train_df,directory=config.CROPPED_IMGS_DIR,x_col='name',y_col='bmi',class_mode='other',color_mod e='rgb',target_size=(config.RESNET50_DEFAULT_IMG_WIDTH,config.RESNET50_DEFAULT_IMG_WIDTH),batch_size=config.TRAIN_BATCH_SIZE)
Model Structure
Model is used to use Keras Created by the ResNet50 class. Select the ResNet50 architecture, the weights are obtained by training an age classifier. Projects from age and gender can be used for transfer learning, and it is also because the ResNet (residual network) architecture is a good model for face image recognition.
Other network architectures have also achieved impressive results on face-based image classification tasks, and future work can explore some of these structures for prediction of BMI index.
implementation model architecture code:
from tensorflow.python.keras.models import Modelfrom tensorflow.python.keras.applications import ResNet50from tensorflow.python.keras.layers import Denseimport configdef get_age_model():# adapted from https://github.com/yu4u/age-gender-estimation/blob/master/age_estimation/model.pyage_model = ResNet50(include_top=False,weights='imagenet',input_shape=(config.RESNET50_DEFAULT_IMG_WIDTH, config.RESNET50_DEFAULT_IMG_WIDTH, 3),pooling='avg')prediction = Dense(units=101, kernel_initializer='he_normal', use_bias=False, activation='softmax', name='pred_age')(age_model.outputtml9)age_model = Model(inputs=age_model.input, outputs=prediction)age_model.load_weights(config.AGE_TRAINED_WEIGHTS_FILE)print 'Loaded weights from age classifier'return age_modeldef get_model():base_model = get_age_model()last_hidden_layer = base_model.get_layer(index=-2)base_model = Model(inputs=base_model.input,outputs=last_hidden_layer.output)prediction = Dense(1, kernel_initializer='normal')(base_model.output)model = Model(inputs=base_model.input, outputs=prediction)return model
transfer learning

first trains the model so that each layer of the original age classifier is frozen to allow the random weights of the new output layer to be updated. The first training included 10 epochs because there was no significant drop in MAE after this (using early stop).
After this initial training phase, the model is trained for 30 epochs, and each layer in the network is thawed to fine-tune all weights in the network. Early stopping also determines the number of epochs here, and training is stopped only after observing the 10 epochs whose MAE has not decreased (patience is 10). As the model achieved the lowest validation MAE in epoch 20, training stopped at epoch 30. Take the weight of the model in epoch 20 and use it in the demonstration below.
average absolute error is selected as a loss function. It is different from mean square error (MSE) or root mean square error (RMSE). The error scale of the BMI prediction is linear (the penalty for error 10 should be twice the penalty for error 5).
model training code:
import cv2import numpy as npfrom tensorflow.python.keras.callbacks import EarlyStopping, ModelCheckpoint, TensorBoardfrom train_generator import train_generator, plot_imgs_from_generator from mae_callback import MAECallbackimport configbatches_per_epoch=train_generator.n //train_generator.batch_sizedef train_top_layer(model):print 'Training top layer...'for l in model.layers[:-1]:l.trainable = Falsemodel.compile(loss='mean_absolute_error',optimizer='adam')mae_callback = MAECallback()early_stopping_callback = EarlyStopping(monitor='val_mae',mode='min',verbose=1,patience=1)model_checkpoint_callback = ModelCheckpoint('saved_models/top_layer_trained_weights.{epoch:02d}-{val_mae:.2f}.h5',monitor='val_mae',mode='min',verbose=1,save_best_only=True)tensorboard_callback = TensorBoard(log_dir=config.TOP_LAYER_LOG_DIR,batch_size=train_generator.batch_size)model.fit_generator(generator=train_generator,steps_per_epoch=batches_per_epoch,epochs=20,callbacks=[mae_callback,early_stopping_callback,model_checkpoint_callback,tensorboard_callback])def train_all_layers(model):print 'Training all layers...'for l in model.layers:l.trainable = Truemae_callback = MAECallback()early_stopping_callback = EarlyStopping(monitor='val_mae',mode='min',verbose=1,patience=10)model_checkpoint_callback = ModelCheckpoint('saved_models/all_layers_trained_weights.{epoch:02d}-{val_mae:.2f}.h5',monitor='val_mae',mode='min',verbose=1,save_best_only=True)tensorboard_callback = TensorBoard(log_dir=config.ALL_LAYERS_LOG_DIR,batch_size=train_generator.batch_size)model.compile(loss='mean_absolute_error',optimizer='adam')model.fit_generator(generator=train_generator,steps _per_epoch=batches_per_epoch,epochs=100,callbacks=[mae_callback,early_stopping_callback,model_checkpoint_callback,tensorboard_callback])
Demo
Below is the model through Christian Bale's photo predicted weight index. Bell was chosen as the subject of research because he was known to dramatically change his weight in different roles. Knowing that his height is 6 feet 0 inches, his weight can be obtained from the model's BMI prediction. The picture on the left of
is from a mechanic, in which Bell said he was "about 135 pounds". If his weight is 135 pounds, his BMI is 18.3 kg/m (unit of BMI), while the model's predictions differ by about 4 kg/m. The picture in the middle is what I think represents his weight, and at the time he didn't completely change it for a character. The picture on the right was taken while shooting Vice. I couldn't find his weight figure while filming Vice, but I found several sources saying he was 45 pounds in weight.If we assume that his average weight is 200 pounds and his weight is 245 pounds and his weight is 33.2 when shooting Vice, the model's prediction of the body weight index for this photo will be about 1 kg/m².
Below is a record of my BMI prediction model. My body mass index is 23 kg/m², and when I look straight at the camera, the model deviation is 2~4 kg/m², and when my head is tilted to one side or facing down, the deviation is as high as 8kg/m².
discussion
The validation MAE of this model is 4.48. Given a person, 5"9 and 195 pounds, the average height and weight of a US male, with a BMI of 27.35 kg/m², this 4.48 error will result in a prediction range of 22.87 kg/m² to 31.83 kg/m², corresponding to 163 and 227 pounds of weight. Obviously, there is room for improvement and future work will work to reduce this error. One obvious disadvantage of the model is that the performance is poor when evaluating images taken from different angles rather than from the front of the subject. When I move my head aside or down, the model's prediction becomes less accurate. Another possible disadvantage of the model may help explain this model for Christian The inaccurate prediction of Bale's first photo is that it performs poorly when the subject is illuminated by a concentrated light source in a dark environment. The shadows caused by strong light change the curvature of the sides of the face and the subtle performance of the skin, causing the impact on the BMI. It is also possible that this model simply overestimates the BMI of subjects with lower overall BMI, which can be seen from its assessment of the first photo of myself and Christian Bale.
These shortcomings of the model may be explained by the strange angles, concentrated light, and lower BMIs in the training data. Most training images were taken in good light from the front of the subject and were higher than 25 BMI The subjects of kg/m² were photographed. Therefore, in these different scenarios, the model may not fully understand the correlation between facial features and BMI.
English original text: https://medium.com/@leosimmons/estimating-body-mass-index-from-face-images-using-keras-and-transfer-learning-de25e1bc0212