In addition to the above features, MediaPipe also supports TensorFlow and TF Lite inference engines. Any TensorFlow and TF Lite model can be used on MediaPipe.

In the previous article of

, we introduced the basic knowledge of MediaPipe Holistic and learned that MediaPipe Holistic uses the posture, facial and hand landmark models of in MediaPipe Pose, MediaPipe Face Mesh and MediaPipe Hands to generate a total of 543 landmarks (33 posture landmarks per hand, 468 facial landmarks and 21 hand landmarks).

is useful for cases where the accuracy of the pose model is low enough that the resulting ROI of the hand is still not accurate enough, but we run an additional lightweight hand re-cropping model, which does the trick and only takes about 10% of the hand model inference time.

MediaPipe

MediaPipe is a multimedia machine learning model application framework developed and open sourced by Google Research. In Google, a series of important products, such as , Google Lens, ARCore, Google Home and , have been deeply integrated with MediaPipe.

MediaPipe image detection

As a cross-platform framework, MediaPipe can not only be deployed on the server side, but also can be used as an On-device Machine Learning Inference framework in multiple mobile terminals (Android and Apple iOS) and embedded platforms (Google Coral and Raspberry Pi ).

The success or failure of a multimedia machine learning application not only depends on the quality of the model itself, but also depends on the effective allocation of device resources, efficient synchronization between multiple input streams, the convenience of cross-platform deployment, and the speed of application construction.

Based on these needs, Google developed and open sourced the MediaPipe project. In addition to the above features, MediaPipe also supports the inference engine of TensorFlow and TF Lite. Any TensorFlow and TF Lite model can be used on MediaPipe. At the same time, on mobile terminals and embedded platforms, MediaPipe also supports GPU acceleration on the device itself. The main concepts of

MediaPipe The core framework of

MediaPipe is implemented in C++ and provides support for languages ​​​​such as Java and Objective C. The main concepts of MediaPipe include packet (Packet), data stream (Stream), calculation unit (Calculator), graph (Graph) and subgraph (Subgraph). Data packet is the most basic data unit . A data packet represents data at a specific time node, such as a frame of image or a short audio signal; the data stream is composed of multiple data packets arranged in ascending chronological order. A specific timestamp (Timestamp) of a data stream only allows the existence of at most one data packet; and the data stream flows in a graph composed of multiple computing units. MediaPipe's graph is directed - packets flow into the graph from the data source (Source Calculator or Graph Input Stream) until they exit at the sink node (Sink Calculator or Graph Output Stream).In the previous article of

, we introduced the basic knowledge of MediaPipe Holistic and learned that MediaPipe Holistic uses the posture, facial and hand landmark models of in MediaPipe Pose, MediaPipe Face Mesh and MediaPipe Hands to generate a total of 543 landmarks (33 posture landmarks per hand, 468 facial landmarks and 21 hand landmarks).

is useful for cases where the accuracy of the pose model is low enough that the resulting ROI of the hand is still not accurate enough, but we run an additional lightweight hand re-cropping model, which does the trick and only takes about 10% of the hand model inference time.

MediaPipe

MediaPipe is a multimedia machine learning model application framework developed and open sourced by Google Research. In Google, a series of important products, such as , Google Lens, ARCore, Google Home and , have been deeply integrated with MediaPipe.

MediaPipe image detection

As a cross-platform framework, MediaPipe can not only be deployed on the server side, but also can be used as an On-device Machine Learning Inference framework in multiple mobile terminals (Android and Apple iOS) and embedded platforms (Google Coral and Raspberry Pi ).

The success or failure of a multimedia machine learning application not only depends on the quality of the model itself, but also depends on the effective allocation of device resources, efficient synchronization between multiple input streams, the convenience of cross-platform deployment, and the speed of application construction.

Based on these needs, Google developed and open sourced the MediaPipe project. In addition to the above features, MediaPipe also supports the inference engine of TensorFlow and TF Lite. Any TensorFlow and TF Lite model can be used on MediaPipe. At the same time, on mobile terminals and embedded platforms, MediaPipe also supports GPU acceleration on the device itself. The main concepts of

MediaPipe The core framework of

MediaPipe is implemented in C++ and provides support for languages ​​​​such as Java and Objective C. The main concepts of MediaPipe include packet (Packet), data stream (Stream), calculation unit (Calculator), graph (Graph) and subgraph (Subgraph). Data packet is the most basic data unit . A data packet represents data at a specific time node, such as a frame of image or a short audio signal; the data stream is composed of multiple data packets arranged in ascending chronological order. A specific timestamp (Timestamp) of a data stream only allows the existence of at most one data packet; and the data stream flows in a graph composed of multiple computing units. MediaPipe's graph is directed - packets flow into the graph from the data source (Source Calculator or Graph Input Stream) until they exit at the sink node (Sink Calculator or Graph Output Stream).The core framework of

MediaPipe

 If we want to use MediaPipe, first, enter python –m pip install in the cmd command box of our computer MediaPipe installs a third-party model, and then we can use code to detect images or videos. The main advantage of this model is that we do not need to download the pre-trained model, just install its mediapipe package. 

MediaPipe image detection

Mediapipe model image code detection

import cv2import mediapipe as mpmp_drawing = mp.solutions.drawing_utilsmp_holistic = mp.solutions.holisticfile = '4.jpg'holistic = mp_holistic.Holistic(static_image_mode=True)image = cv2.imread(file)image_hight, image_width, _ = image.shapeimage = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)results = holistic.process(image)

First, we import the required third-party library and configure the size of the points, line size and color that need to be drawn. These information can be modified by ourselves. Here we directly quote the official configuration for design (
mp.solutions.drawing_utils function)

and then define a holistic detection model function

mp_holistic = mp.solutions.holisticfile = '4.jpg'holistic = mp_holistic.Holistic(static_image_mode=True)

Then use the relevant knowledge of opencv that we introduced earlier to read the image we need to detect from the system, and get the size of the image

image = cv2.imread(file)image_hight, image_width, _ = image.shape

Since the default color space of OpenCV is BGR, but generally the color space we talk about is RGB, here mediapipe modifies the color space

and then uses the holistic detection model we established earlier to detect the image

image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)results = holistic.process(image)

The results after the model detection is completed are saved in results. We need to access this result and draw the detected face, hand, and posture assessment data points on the original detection picture in order to view

if results.pose_landmarks:print(f'Nose coordinates: ('f'{results.pose_landmarks.landmark[mp_holistic.PoseLandmark.NOSE].x * image_width}, 'f'{results.pose_landmarks.landmark[mp_holistic.PoseLandmark.NOSE].y * image_hight})')annotated_image = image.copy()mp_drawing.draw_landmarks(annotated_image, results.face_landmarks, mp_holistic.FACE_CONNECTIONS)mp_drawing.draw_landmarks(annotated_image, results.left_hand_landmarks, mp_holistic.HAND_CONNECTIONS)mp_drawing.draw_landmarks(annotated_image, results.right_hand_landmarks, mp_holistic.HAND_CONNECTIONS)mp_drawing.draw_landmarks(annotated_image, results.pose_landmarks, mp_holistic.POSE_CONNECTIONS)

Here we print the results of the image detection, and draw the face detection model data, the detection data of the left and right hands, and the human posture detection data

#cv2.imshow('annotated_image',annotated_image)cv2.imwrite('4.png', annotated_image)cv2.waitKey(0)holistic.close()

After the drawing is completed, we can display the image for easy viewing, or directly use OpenCV's imwrite function to save the result image. Finally, we only need to close the holistic detection model. There is a problem when detecting multiple people. Only a single person is detected. We will study

image detection

Mediapipe later. Model video code detection

Of course, we can also perform Mediapipe model detection directly in the video

import cv2import timeimport mediapipe as mpmp_drawing = mp.solutions.drawing_utilsmp_holistic = mp.solutions.holisticholistic = mp_holistic.Holistic(min_detection_confidence=0.5, min_tracking_confidence=0.5)

First, it is consistent with image detection. We build a Holistic detection model, and then we can turn on the camera to detect the model

cap = cv2.VideoCapture(0)time.sleep(2)while cap.isOpened():success, image = cap.read()if not success:print("Ignoring empty camera frame.")continueimage = cv2.cvtColor(cv2.flip(image, 1), cv2.COLOR_BGR2RGB)image.flags.writeable = Falseresults = holistic.process(image)

First we open the default camera and get the detected real-time image from the camera

cap = cv2.VideoCapture(0)while cap.isOpened():success, image = cap.read()

After detecting the image, we can directly use the image detection steps to detect the model

image = cv2.cvtColor(cv2.flip(image, 1), cv2.COLOR_BGR2RGB)image.flags.writeable = Falseresults = holistic.process(image)

Here we used cv2.flip(image, 1) Use the image flip function to enhance the data image. Since the image in our camera is a mirror image to us,

cv2.flip(image, 1)

can use this function to mirror our image. Finally, assign the image to the Holistic model for detection

image.flags.writeable = Trueimage = cv2.cvtColor(image, cv2.COLOR_RGB2BGR)mp_drawing.draw_landmarks(image, results.face_landmarks, mp_holistic.FACE_CONNECTIONS)mp_drawing.draw_landmarks(image, results.left_hand_landmarks, mp_holistic.HAND_CONNECTIONS)mp_drawing.draw_landmarks(image, results.right_hand_landmarks, mp_holistic.HAND_CONNECTIONS)mp_drawing.draw_landmarks(image, results.pose_landmarks, mp_holistic.POSE_CONNECTIONS)cv2.imshow('MediaPipe Holistic', image)if cv2.waitKey(5) & 0xFF == ord('q'):breakholistic.close()cap.release()

After the detection is completed, we can draw the data in real time to view the results in real time in the video

Video detection

Here due to the default settings, the sizes of the lines and points are not appropriate. We will slowly optimize

later.