Overview of Image Segmentation Methods and Application Introduction "AI Core Algorithm"

2020/12/2920:50:05 technology 1853

attention: farming intelligent , deep AI dehydration dry

Author: missinglink.ai compilation: ronghuaiyang Source: AI park

reprint please contact the author

Abstract: This article describes the image segmentation methods, including traditional methods and depth of learning, And application scenarios.

Modern computer vision technology based on artificial intelligence and deep learning methods has made significant progress in the past 10 years. Today, it is used in applications such as image classification, face recognition, object recognition in images, video analysis and classification, and image processing for robots and autonomous vehicles.

Many computer vision tasks require intelligent segmentation of images to understand the content of the image and make the analysis of each part easier. Today's image segmentation technology uses computer vision deep learning models to understand the real objects represented by each pixel of the image, which was unimaginable ten years ago.

Deep learning can learn the patterns of visual input to predict the object classes that make up an image. The main deep learning architecture used for image processing is Convolutional Neural Networks (CNN), or specific CNN frameworks such as AlexNet, VGG, Inception, and ResNet. The deep learning model of computer vision is usually trained and executed on a dedicated graphics processing unit (GPU) to reduce computing time.

What is image segmentation?

Image segmentation is a key process in computer vision. It involves segmenting the visual input into segments to simplify image analysis. A segment represents an object or a part of an object, and is composed of a set of pixels or "super pixels". Image segmentation organizes pixels into larger parts, eliminating the need for a single pixel as the unit of observation. There are three levels of image analysis:

classification-divide the entire picture into categories such as "people", "animals", and "outdoors"

target detection-detect objects in the image and draw a rectangle around them, such as a person or a sheep.

Segmentation-Identify parts of the image and understand what objects they belong to. Segmentation is the basis for target detection and classification.

semantic segmentation vs. instance segmentation

In the segmentation process itself, there are two levels of granularity:

semantic segmentation-divides all pixels in the image into meaningful object classes. These classes are "semantic interpretable" and correspond to real-world classes. For example, you can separate all the pixels related to the cat and paint them green. This is also called dense prediction because it predicts the meaning of each pixel.

instance segmentation-identifies each instance of each object in the image. The difference between it and semantic segmentation is that it does not classify each pixel. If there are three cars in an image, semantic segmentation will classify all cars into one instance, while instance segmentation will identify each car.

Traditional image segmentation methods

also have some image segmentation techniques commonly used in the past, but they are not as efficient as deep learning techniques because they use strict algorithms and require manual intervention and professional knowledge. These include:

threshold-segment the image into foreground and background. The specified threshold divides the pixels into one of two levels to isolate objects. Thresholding converts a grayscale image into a binary image or distinguishes lighter and darker pixels of a color image.

K-means clustering-The algorithm identifies groups in the data, and the variable K represents the number of groups. The algorithm assigns each data point (or pixel) to one group based on feature similarity. Clustering does not analyze predefined groups, but works iteratively to form groups organically.

Image segmentation based on histogram-Use histogram to group pixels according to "gray scale". A simple image consists of an object and a background. The background is usually a gray level and is a larger entity. Therefore, a larger peak represents the background grayscale in the histogram. A smaller peak represents this object, which is another gray level.

Edge detection-Identify sharp changes or discontinuities in brightness. Edge detection usually involves arranging discontinuous points into curved line segmentsOr the edge. For example, the border between a piece of red and a piece of blue.

How deep learning helps image segmentation methods

Modern image segmentation technology is powered by deep learning technology. The following are several deep learning architectures for segmentation:

uses CNN for image segmentation, which uses the patch of the image as input to the convolutional neural network, and the convolutional neural network marks the pixels. CNN cannot process the entire image at once. It scans the image, looking at a small "filter" consisting of a few pixels at a time, until it maps the entire image.

The traditional cnn network has a fully connected layer and cannot handle different input sizes. FCNs use convolutional layers to process inputs of different sizes and can work faster. The final output layer has a larger receptive field, corresponding to the height and width of the image, and the number of channels corresponds to the number of classes. The convolutional layer classifies each pixel to determine the context of the image, including the location of the target.

integrated learning combines the results of two or more related analysis models into a single. Integrated learning can improve prediction accuracy and reduce generalization errors. In this way, images can be classified and segmented accurately. Through integrated learning, try to generate a set of weak basic learners, classify parts of the image, and combine their output, instead of trying to create a single optimal learner.

DeepLab One of the main motivations for using DeepLab is to perform image segmentation while helping to control signal extraction-reducing the number of samples and the amount of data that the network must process. Another motivation is to enable multi-scale contextual feature learning-to aggregate features from images of different scales. DeepLab uses ImageNet pre-trained ResNet for feature extraction. DeepLab uses hole convolution instead of regular convolution. The different expansion rate of each convolution enables the ResNet block to capture multi-scale context information. DeepLab consists of three parts:

Atrous convolutions-using a factor, you can expand or contract the field of view of the convolution filter.

ResNet — Microsoft's Deep Convolutional Network (DCNN). It provides a framework to train thousands of layers while maintaining performance. ResNet's powerful representation capabilities have promoted the development of computer vision applications, such as object detection and face recognition.

Atrous spatial pyramid pooling (ASPP) — Provides multi-scale information. It uses a set of complex functions with different expansion rates to capture a wide range of contexts. ASPP also uses Global Average Pool (GAP) to merge image-level features and add global context information.

SegNet neural network An architecture based on depth encoder and decoder, also known as semantic pixel segmentation. It includes low-dimensional encoding of the input image, and then using the direction invariance capability in the decoder to restore the image. Then a segmented image is generated on the decoder side.

image segmentation application

image segmentation helps determine the relationship between the targets, and the context of the target in the image. Applications include face recognition, license plate recognition and satellite image analysis. For example, industries such as retail and fashion use image segmentation in image-based searches. Self-driving cars use it to understand their surroundings.

target detection and face detection

these applications include identifying specific types of target instances in digital images. Semantic objects can be classified into categories, such as human faces, cars, buildings, or cats.

Face Detection-A type of object detection used in many applications, including biometric and autofocus functions of digital cameras. Algorithms detect and verify the presence of facial features. For example, the eyes appear as valleys in grayscale images.

Medical Imaging-Extract clinically relevant information from medical images. For example, radiologists can use machine learning to enhance analysis by segmenting images into different organs, tissue types, or disease symptoms. This can reduce the time required to run diagnostic tests.

Machine Vision-An application that captures and processes images and provides operation guidance for equipment. This includes industrial and non-industrial applications. Machine vision system useDigital sensors in dedicated cameras enable computer hardware and software to measure, process and analyze images. For example, the inspection system takes a picture of a soda bottle and then analyzes the image according to pass-fail criteria to determine whether the bottle is filled correctly.

video surveillance — video tracking and moving target tracking

This involves locating moving objects in the video. Its uses include security and surveillance, traffic control, human-computer interaction, and video editing.

Autonomous driving Autonomous vehicles must be able to perceive and understand their environment in order to drive safely. Related categories of objects include other vehicles, buildings, and pedestrians. Semantic segmentation enables autonomous vehicles to recognize which areas in the image can be safely driven.

Iris Recognition A biometric recognition technology that can recognize complex iris patterns. It uses automatic pattern recognition to analyze video images of the human eye.

Face recognition Recognize individuals from videos. This technology compares the facial features selected from the input image with the faces in the database.

Retail Image Recognition

This application allows retailers to understand the layout of goods on the shelf. The algorithm processes the product data in real time and detects whether there are products on the shelf. If a product is out of stock, they can find out the reason, notify the merchandiser, and recommend a solution for the corresponding part of the supply chain.

The essence of history is good text

exchange and cooperation