
Chejie (public account: chedongxi)
Author |Chejiejie team
Edit |Xiaohan
Just, just now, Tesla once again shocked the entire automobile industry!
During Tesla's Artificial Intelligence Day (AI Day) held this morning, Tesla showed off the brain behind the "Fully Autonomous Driving Function (FSD)" - the Tesla supercomputer Dojo.
Dojo is responsible for training AI algorithms. It has built-in Tesla's self-developed AI chip D1. A single D1 has 50 billion transistors, a peak computing power (BF16/CFP8) of 362TFLOPS, and its power consumption does not exceed 400W. The Tesla ExaPOD computing power, composed of 3,000 D1 chips, is as high as 1.1EFLOPS!
Before that, Tesla even used 5760 Nvidia A100 Tensor Core GPUs to assemble a supercomputer with 1.8EFLOPS computing power, ranking fifth in the world.
Yes, Tesla has long become an AI computing company.
's powerful computing power must serve Tesla's FSD function. Currently, Tesla's L2 autonomous driving on urban roads has been open for small-scale testing for nearly a year and has not been opened to the public. At the same time, this year's US version of Tesla Model 3/Y cancels millimeter-wave radar and adopts pure visual perception. Is FSD really reliable?
was on the scene today. Tesla's senior director of AI Andrej Karpathy and autonomous driving software director Ashok Elliswamy and several Tesla executives made it clear that pure visual perception achieves autonomous driving. It can be said that Tesla's technical advantages of autonomous driving have been made public, with a lot of practical information. Before the
press conference ended, Musk released an easter egg, and Tesla is going to "create people"! Tesla robots will be mass-produced next year, and it will replace humans to do dangerous, repetitive, and boring tasks.
is different from other robots. The robot named Tesla Bot looks very close to humans and also has human-level hands and work abilities. Musk said it will be mass-produced in 2022.
summary, Tesla's AI Day is full of practical things, and it also gives Tesla Bot one more thing. The following are the practical things for the event.
Don’t have time to watch Tesla AI Day? It doesn't matter, we have already downloaded PPT for you. Follow the official account Car Game (ID: chedongxi), and reply to [Tesla PPT] to download the dialog box.
1. Promote self-developed AI training chips. The cabinet computing power reaches 1.1EFLOPS
delays by more than half an hour compared to the established time. It is already an "old tradition" for Tesla to hold a press conference. AI There is a latest Cybertruck model parked outside the Day venue, and dozens of people are expected to visit outside the venue. However, the media broadcast live said that the number of participants on site today was not large. When the launch of the Cybertruck
was parked outside the venue, it was already nearly 40 minutes later than planned. Tesla first showed off the latest version of FSD. Turn left at unprotected intersections, rural roads without lane lines, avoid pedestrians, stop at intersections, identify traffic lights... Tesla is very skilled in these actions.

FSD display before Tesla AI Day event
Compared with the 2019 Tesla FSD demonstration video, the Tesla driver held the steering wheel with one hand in this year's demonstration video, which shows that Tesla's autonomous driving capabilities are still at L2 level, and are not as beautiful as imagined "fully autonomous driving".

Tesla's FSD video changes, from loosening your hand to holding the steering wheel in your hand
From a technical perspective, how Tesla realizes autonomous driving, how to generate training data, how to run in the car, and how to iterate algorithms are the four core issues of realizing autonomous driving.
At today's AI Day launch conference, the most important information is that Tesla released a supercomputer for training autonomous driving, that is, how Tesla continuously iterates its algorithms.
According to Ganesh Venkataramanan, the head of the Dojo project, a few years ago, Musk asked Tesla engineers to design an ultra-high-speed training computer a few years ago, which is why Tesla started the Dojo project. The Dojo supercomputer will be put into operation next year and will train AI algorithms based on a large number of videos.

Tesla D1 chip display
Dojo is a distributed computing architecture connected through a network structure. It has a large computing plane, ultra-high bandwidth and low latency, large network partitions and mapping, etc., and has a new compiler to reduce local communication and global communication, which is highly scalable.
This supercomputer has built-in Tesla's self-developed AI training chip D1. The D1 chip is made of 7nm process, with a single chip area of 645mm², containing 50 billion transistors, the peak computing power of BF16/CFP8 reaches 362TFLOPS, the peak computing power of FP32 reaches 22.6TFLOPS, and the thermal design power consumption (TDP) does not exceed 400W.

Various performances of Tesla D1 chip
This chip has GPU-level training capabilities and CPU-level controllability, and can achieve seamless connection of 500,000 training nodes. Therefore, Tesla proposed a training unit (tile) composed of 25 D1 chips.
The interface bandwidth of a training unit is 36TB per second, and the computing power reaches 9 PFLOPS. It adopts centralized power supply and heat dissipation design, and has a heat dissipation capacity of 15kW.

Tesla chip array
has 120 training units, 3,000 D1 chips, and more than 1 million training nodes. Its BF16/CFP8 computing power is as high as 1.1EFLOPS.

Tesla chip array display
Its distributed system is partitioned. The Dojo processing unit DPU (Dojo Processing Unit) is a virtual device that can be resized according to application needs, including multiple D1 chips and interface processors. The Tesla compiler engine can automatically map execute instructions to the DPU without manual operation. Tesla has created a complete software stack.

Dojo software stack architecture
Ganesh says Tesla Dojo is the fastest AI training computer in history. With the same power consumption, Dojo supercomputer has 4 times performance, 1.3 times energy efficiency, and its carbon footprint is only 1/5 of the original one. In addition, Tesla predicts that the next-generation Dojo supercomputer performance will be improved by another 10 times, but it did not disclose the specific implementation date.
At the end of the Dojo release session, Ganesh mentioned that Tesla is vigorously recruiting talents to promote AI research and development.
Before Tesla created its own supercomputer, it had already used Nvidia GPU to build a fifth-ranked supercomputer in the world.

Tesla supercomputer
This supercomputer uses 720 nodes, each node with 8 Nvidia A100 Tensor Core GPUs (5760 GPUs in total), achieving 1.8EFLOPS performance.

NVIDIA A100 Tensor Core GPU
It can be seen that both Tesla's Dojo and the supercomputer previously assembled with NVIDIA GPUs can rank among the top few in the field of AI computing. During the
question and answer, Musk responded that developing all of these technologies is very expensive, so he is not sure how to achieve open source, but is open to licensing AI technology to other auto companies.
2. Break all the unreliability of pure visual perception. Thousand-member team tagged data
Tesla AI senior director Andrej Karpathy came on stage and told how Tesla achieved autonomous driving through visual perception and then planned control.
He said that Tesla forms a three-dimensional vector space through the 8 cameras around the body, and senses the environment around the body.

Comparison of human eye visual information transmission and Tesla AI visual information transmission
Karpathy said that when designing autonomous driving AI vision, adult brain recognition images can be reverse engineered. For example, when designing its "visual cortex" for cars, Tesla modeled based on how the eyes perceive biological vision.
Tesla's eight cameras all use 1280*960 resolution 12bit HDR images, and can collect them at a rate of 36 frames per second, which can achieve good perception effects.

The eight cameras on the body of the car are integrated into a three-dimensional "vector space"
When calculated by neural network, the autonomous driving computer will continuously reduce the resolution and increase the number of channels.
In addition to identifying vehicles, Tesla also recognizes various objects such as people and traffic lights. Therefore, Tesla has developed a HydraNets network that many people need to learn. The HydraNets network has three characteristics: the first is to be able to efficiently test; the second is to be able to fine-tune each task separately, and also to cache and accelerate fine-tune features to break through the bottleneck of reproduction.
Currently, many car companies use high-precision maps and sensors to achieve perception fusion, but this method cannot allow vehicles to drive automatically correctly. Therefore, Tesla developed the Occupancy Tracker. At this time, Tesla encountered two problems. One is that the multi-sensor fusion algorithm is not precise (for example, a single camera cannot fully sense it), and the other is that the image space is not a real physical space.
Tesla uses the Transformer algorithm to predict distances. Actual tests found that on a section of urban roads where vehicles are parked on both sides, the accuracy and stability of multi-camera perception are much stronger.
Tesla believes that there are two difficulties in making autonomous driving decisions. The first is that the space is non-convex, and the second is high-dimensional.
Tesla's autonomous driving software director Ashok Elliswamy said that Tesla uses a hybrid decision-making system, first allowing the perceived data to be roughly searched through the vector space , and then after continuous optimization, it can finally form a smooth motion trajectory.
In one case, the vehicle judges that it can change lanes to the left, but there are vehicles driving normally on the left. If the lanes suddenly change at this time, the ride in the car will be poor and it will also threaten traffic safety. Therefore, Tesla will search for 2,500 lane change timings in 1.5 milliseconds, and finally choose the safest and most comfortable lane change timing to take lane change measures.
In another case, Tesla drove to an ultra-narrow section and two cars could not pass at the same time. The first car opposite was an SUV, and the owner took the initiative to give in. Therefore, Tesla judged to continue moving forward. But after walking for a while, I found another car coming from the opposite direction.

Tesla FSD passes complex road conditions
At this time, Tesla chooses to avoid parking and at the same time, the opposite vehicle also chooses to avoid parking. Therefore, Tesla decisively changed its driving decisions and started to pass this section again.
As the development time of autonomous driving grows, Tesla needs to mark tags for more objects. Now there is a data tag team of 1,000 people to build data tags and analyze infrastructure.
and, from the previous 2D image tag, evolved into the current 4D space + time tag. Even after making a tag, the tagged image in one camera can be migrated to other cameras.

Tesla builds 4D space + time
Tesla can also rebuild roads during the perception process, by marking lane lines and other objects. At the same time, Tesla will also collect data from the same section, collect data through multiple cars and merge it together, ultimately achieving a more accurate re-construction of the map.
Finally, the vehicle can smoothly mark objects by the roadside, and only after accurately identifying objects can it achieve smooth autonomous driving on urban roads.
In May this year, Tesla's Model 3/Y model sold to the US market canceled its millimeter-wave radar. However, in weather with low visibility such as rain, fog, and snow, can the camera see clearly? The answer is yes.
still relies on short videos to record driving scenes. Tesla can obtain 10,000 similar harsh environment short videos every week, and ultimately achieve accurate distance perception through automatic tags.

Tesla FSD can deal with a variety of bad weather
At the same time, Tesla is also conducting Autopilot simulation tests, which Tesla calls it a game with Autopilot. In simulation testing, the computer can accurately label and deploy virtual vehicles. Simulation tests are used to simulate situations that are difficult to detect in life, such as what to do if someone walks on a highway? How to label if there are too many people? How to avoid other vehicles in the parking lot?

Tesla conducts simulation test
In simulation test, engineers can test these extremely special situations. There are many necessary preparations for simulation testing: the first is that the sensor simulation should be basically similar to the real situation, the second is to render it realistically, and the third is to have basic real scenes, including vehicles and pedestrians. Tesla even built more than 2,000 miles of roads. The fourth is to have scalable scenarios, including different scenarios such as day and night; the fifth is to also test the algorithm in simulation tests through real scene reconstruction.
So far, there are 371 million images trained in Tesla's in-car network and 480 million tags.
Next, in addition to dynamic objects such as people and cars, Tesla will also detect static objects, road topology , more vehicles and pedestrians, and reinforcement learning to make pure visual perception more accurate.
3. New robot products are unveiled. They can work on behalf of humans.
After a brief robot dress dance, Musk announced the launch of Tesla robot Tesla Bot in 2022, which will replace humans to do dangerous, repetitive and boring tasks.

Tesla Robot Display
Musk introduced that the robot will be 5 feet 8 inches tall (about 1.73 meters), weigh 125 pounds (about 56.7 kg), can pick up 45 pounds of cargo (about 20.4 kg), or lift 150 pounds of weight (about 68.0 kg) through a weightlifting deadlift position.

Tesla robot parameters
Its limbs operate with 40 electromechanical actuators and achieves smooth and agile walking with both feet through a force feedback sensing system, with a fastest walking speed of 5 mph (about 8 km/h).
Musk said: "If this robot 'rebelled', you can still run away from him."
In addition, the hands of this robot are very similar to those of a human, with 5 fingers that can bend flexibly. Musk calls it a "Human-Level" hand, in other words, it has the potential to perform some precise operational tasks instead of humans.

Tesla Robot Construction
Tesla also plans to implant hardware systems including FSD Computer into the robot's body as the "organ" of the latter, and train the robot's AI by training the autonomous driving system AI, so that this robot can become a versatile.
Musk said that Tesla's original intention of launching this robot was to hope that it can complete some boring, dangerous and repetitive tasks instead of humans. He hopes that in the future, all things that humans don't want to do will be left to Tesla robots. Of course, human creativity is infinite, and Musk speculates that people may discover uses that he would not expect.
However, this robot may not be available next year. Musk said that in order to ensure the functionality of Tesla's robot, they also need to use the Dojo training ground to conduct rigorous training.
Conclusion: Tesla is one step closer to autonomous driving
Nowadays, AI technology is the key to achieving autonomous driving. After sensor perception is completed, all processes of computing and decision-making need the support of AI technology. Only by mastering AI technology can autonomous driving be more reliable.
Now, Tesla has created its own AI supercomputer, and training AI models will continue to speed up, and through more scenarios and cases, it will ultimately achieve safer autonomous driving functions.