In September 2017, Loongson Club held a group purchase of Loongson 3A motherboard. As a fan who has been following Loongson for many years, I participated in this group purchase and purchased a Loongson 3A3000 motherboard. Given that the Loongson 3A4000 processor is about to be slid, and there is no detailed evaluation of the performance of the Loongson 3A3000 processor that is about to be outdated, I used phronix-test-suite to make an as rational, neutral, objective and comprehensive evaluation of the Loongson 3A3000 processor as possible, without hype or exaggerating the results or avoiding the problem.
Introduction to Loongson 3A3000 motherboard
First of all, let’s post the picture and review the appearance of Loongson 3A3000 motherboard.
IT Home Netizen submission: A comprehensive and thorough understanding of the performance details of the domestic Loongson 3A3000 processor
Figure 1 Photo of the Loongson 3A3000 motherboard, under the fan is Loongson's CPU. Below the other two radiators are the South Bridge and the North Bridge.
IT Home Netizens Submission: Fully-study-study-study-study-study-study-study-study-study-study-study-study-study-study-study-study-study-study-study-study-study-study-study-study-study-study-study-study-study-study-study-study-study-study-study-study-study-study-study-study-study-study-study-study-study-study-study-study-study-study-study-study-study-study-study-study-study-study-study-study-study-study-study-study-study-study-study-study-study-study-study-study-study-study-study-study-study-study-study-study-study-study-study-study-study-study-study-study-study-study-study-study-study-study-study-study-study-study-study-study-study-study-study-study-study-study-study-study-study-study-study-study-study-study-study-study-study-study-study-study-stu
IT Home Netizen Submission: A comprehensive and thorough understanding of the performance details of the domestic Loongson 3A3000 processor
Figure 4 Domestic Unilc (Xi'an Uniscope Guoxin Semiconductor) memory stick
The Loongson 3A3000 processor I got is not the 1.5GHz main frequency version with the highest performance, but the 1.4GHz version with slightly lower performance. So when estimating the performance of Loongson 3A3000 (1.5GHz), my test results should be multiplied by 1.07. In addition, it should be noted that the main frequency of the Loongson 3A3000 included in the Loongson 3A3000 notebook is limited to 1.2GHZ.
After getting the motherboard, refer to the motherboard manual and Baidu Tieba manual. In addition, with the help of Loongson Club Group and Loongson computer users and developer group netizens, I installed a Loongson 3A3000 host and installed two operating systems Debian buster and Loongnix.
As an old fan of Loongson, I have used the Loongson box of Fulong 6003 and the 8089D notebook. Both machines use the Loongson 2F processor. From a personal experience, the graphical interface of Loongson 2F can only be said to be "usable", but it is still too slow to use. The performance of Loongnix 3A3000 has made great progress compared to Loongnix 2F. Whether it is Loongnix or Debian, it can be said to be truly smooth in use. Using Firefox for web browsing, it is very smooth to watch local high-definition videos. In terms of user experience, Loongson 3A3000 can fully meet the basic needs of office, surfing the Internet, listening to music, and watching videos.
Loongson 3A3000 processor performance evaluation solution
Compared with the mainstream x86 processor, how big is the gap between Loongson 3A3000? In terms of performance, which CPU is Loongson 3A 3000 equivalent to? Which processor is about to flow? Compared with other domestic processors, is the performance of Loongson 3A3000 high or low?
To answer the above question, I used the benchmark program on CPU performance provided in the phoronix test suite to provide a detailed comparison of the i5-7200U processor and the Loongson 3A3000 processor on the X270 laptop. In addition, the openbenchmarking website provides many different processor performance test results, which can provide people with more objective evaluation of the performance of the system.
I found some evaluation results of Feiteng 1500A and Feiteng 2000+ processors on the openbenchmarking website, and will also compare these data in this test. For reference, I looked for some performance data of Intel J1900 processors.
In recent times, the performance of the Zhaoxin processor has improved rapidly. It has produced multiple series of processors such as ZX-C, KX-5000 and KX-6000, and has also received some reviews on the Internet. However, I have not found any trace of the latest processors of Zhaoxin on the Openbenchmarking website, so the performance of Zhaoxin and Loongson cannot be compared in this article.
In order to compare the performance gap between the two CPUs at the same main frequency, the turbo mode of the Intel i5-7200U processor was turned off in the test, and the power management of automatic downs was removed, and the frequency was locked at 2.5GHz. The i5-7200U processor can turbo frequency up to 3.1GHz, so its actual peak performance is higher than the data in this test.
IT Home Netizen Submission: A comprehensive and thorough understanding of the performance details of the domestic Loongson 3A3000 processor
During the test, we also paid attention to the single-core and multi-core performance of the processor. Recently, in the evaluation, the Zhaoxin KX6000 series processors concluded that the performance of KX6000 is equivalent to that of i5-7400 processors based on the test results of multi-threaded programs such as 7zip, but intentionally or unintentionally ignored that the KX6000 is an 8-core and 8-thread processor and the i5-7400 is a 4-core and 4-thread processor. If you look at single-core performance, then the KX6000 is roughly half of the i5-7400.
The following is a comparison of the basic situations of multiple processors in this evaluation.
IT Home Netizens Submission: A comprehensive dive into the performance details of the domestic Loongson 3A3000 processor
Intel i5-7200U
https://ark.intel.com/products/95443/Intel-Core-i5-7200U-Processor-3M-Cache-up-to-3_10-GHz
Intel J1900
Intel J1900
https://ark.intel.com/zh-cn/products/78867/Intel-Celeron-Processor-J1900-2M-Cache-up-to-2_42-GHz
Loongson 3A 3000
http://www.loongson.cn/product/cpu/3/3A3000.html
The test uses phronix-test-suite version 7.8.0.
http://www.phoronix-test-suite.com/?k=downloads
Tests were conducted on X270 laptops equipped with Intel i5-7200U processor and the self-assembled Loongson 3A3000 computer, among which the test of Loongson 3A3000 was carried out on the Debian testing system. The test environment is shown in Table 2:
IT Home Netizen Submission: A comprehensive and thorough understanding of the performance details of the domestic Loongnix 3A3000 processor
During the test, I discovered some test results that were suspected to have been conducted by Loongnix lab using the 1.5GHz Loongnix 3A 3000 processor. Therefore, I no longer test the performance of this processor using the Loongnix system alone.
For the compiler options used by Loongson 3A3000, you can refer to this link
for some applications. The n32 abi is used during compilation, that is, the parameter -mabi=n32 is added.
Since CPU performance is what I want to know the most, during the test, it is mainly concerned with testing that can reflect CPU performance, rather than considering performance testing of hardware such as disk, graphics card, memory, etc.
Introduction to test program and analysis of test results
Scientific calculation
1. scimark2
This test runs the ANSI C version of SimiMark 2.0, which is the benchmark for scientific and numerical calculations developed by programmers from the National Institute of Standards and Technology. The test consists of fast Fourier transform, Jacobian successive superslack, Monte Carlo, sparse matrix multiplication and dense LU matrix decomposition benchmarks. This test is a single-core performance test.
Test results:
IT Home Netizen submission: A comprehensive digging into the performance details of the domestic Loongson 3A3000 processor
Figure 5 Scimark2 performance comparison
The test results are shown in Figure 5. In the figure, the performance of Loongson 3A3000 (red) is referenced (1.0), the performance of i5-7200U (blue) and J1900 (green) are both compared with Loongson 3A3000. The higher the value, the better the performance. Given that Loongson 3A4000 is about to flow, according to news released by Loongson, the performance of Loongson 3A4000 has increased by 30% compared to the 3A3000's same main frequency, and the main frequency will be increased from 1.5GHz to 2.0GHz. In addition, Loongson 3A4000 will also add 256-bit SIMD instructions and increase the L3 cache from 8MB to 12MB. The score of SPEC CPU 2006 reaches 20 points, reaching twice that of Loongson 3A3000.
http://www.ict.cas.cn/kycg/cgnb/201709/P020170926639136974767.pdf
Therefore, we set the performance of 3A4000 to 2.0 as a reference for performance prediction.
From the test results in Figure 5, it can be seen that in Scimark's Monte Carlo test, the 3A3000 performance is very bad, only less than 10% of the i5-7200U, and even less than 30% of the J1900. This is very abnormal. There may be some floating point operation that does not use hardware floating point, but uses software-simulated floating point operation.
In September 2017, Loongson Club held a group purchase of Loongson 3A motherboard. As a fan who has been following Loongson for many years, I participated in this group purchase and purchased a Loongson 3A3000 motherboard. Given that the Loongson 3A4000 processor is about to be slid, and there is no detailed evaluation of the performance of the Loongson 3A3000 processor that is about to be outdated, I used phronix-test-suite to make an as rational, neutral, objective and comprehensive evaluation of the Loongson 3A3000 processor as possible, without hype or exaggerating the results or avoiding the problem.
Introduction to Loongson 3A3000 motherboard
First of all, let’s post the picture and review the appearance of Loongson 3A3000 motherboard.
IT Home Netizen submission: A comprehensive and thorough understanding of the performance details of the domestic Loongson 3A3000 processor
Figure 1 Photo of the Loongson 3A3000 motherboard, under the fan is Loongson's CPU. Below the other two radiators are the South Bridge and the North Bridge.
IT Home Netizens Submission: Fully-study-study-study-study-study-study-study-study-study-study-study-study-study-study-study-study-study-study-study-study-study-study-study-study-study-study-study-study-study-study-study-study-study-study-study-study-study-study-study-study-study-study-study-study-study-study-study-study-study-study-study-study-study-study-study-study-study-study-study-study-study-study-study-study-study-study-study-study-study-study-study-study-study-study-study-study-study-study-study-study-study-study-study-study-study-study-study-study-study-study-study-study-study-study-study-study-study-study-study-study-study-study-study-study-study-study-study-study-study-study-study-study-study-study-study-study-study-study-study-study-stu
IT Home Netizen Submission: A comprehensive and thorough understanding of the performance details of the domestic Loongson 3A3000 processor
Figure 4 Domestic Unilc (Xi'an Uniscope Guoxin Semiconductor) memory stick
The Loongson 3A3000 processor I got is not the 1.5GHz main frequency version with the highest performance, but the 1.4GHz version with slightly lower performance. So when estimating the performance of Loongson 3A3000 (1.5GHz), my test results should be multiplied by 1.07. In addition, it should be noted that the main frequency of the Loongson 3A3000 included in the Loongson 3A3000 notebook is limited to 1.2GHZ.
After getting the motherboard, refer to the motherboard manual and Baidu Tieba manual. In addition, with the help of Loongson Club Group and Loongson computer users and developer group netizens, I installed a Loongson 3A3000 host and installed two operating systems Debian buster and Loongnix.
As an old fan of Loongson, I have used the Loongson box of Fulong 6003 and the 8089D notebook. Both machines use the Loongson 2F processor. From a personal experience, the graphical interface of Loongson 2F can only be said to be "usable", but it is still too slow to use. The performance of Loongnix 3A3000 has made great progress compared to Loongnix 2F. Whether it is Loongnix or Debian, it can be said to be truly smooth in use. Using Firefox for web browsing, it is very smooth to watch local high-definition videos. In terms of user experience, Loongson 3A3000 can fully meet the basic needs of office, surfing the Internet, listening to music, and watching videos.
Loongson 3A3000 processor performance evaluation solution
Compared with the mainstream x86 processor, how big is the gap between Loongson 3A3000? In terms of performance, which CPU is Loongson 3A 3000 equivalent to? Which processor is about to flow? Compared with other domestic processors, is the performance of Loongson 3A3000 high or low?
To answer the above question, I used the benchmark program on CPU performance provided in the phoronix test suite to provide a detailed comparison of the i5-7200U processor and the Loongson 3A3000 processor on the X270 laptop. In addition, the openbenchmarking website provides many different processor performance test results, which can provide people with more objective evaluation of the performance of the system.
I found some evaluation results of Feiteng 1500A and Feiteng 2000+ processors on the openbenchmarking website, and will also compare these data in this test. For reference, I looked for some performance data of Intel J1900 processors.
In recent times, the performance of the Zhaoxin processor has improved rapidly. It has produced multiple series of processors such as ZX-C, KX-5000 and KX-6000, and has also received some reviews on the Internet. However, I have not found any trace of the latest processors of Zhaoxin on the Openbenchmarking website, so the performance of Zhaoxin and Loongson cannot be compared in this article.
In order to compare the performance gap between the two CPUs at the same main frequency, the turbo mode of the Intel i5-7200U processor was turned off in the test, and the power management of automatic downs was removed, and the frequency was locked at 2.5GHz. The i5-7200U processor can turbo frequency up to 3.1GHz, so its actual peak performance is higher than the data in this test.
IT Home Netizen Submission: A comprehensive and thorough understanding of the performance details of the domestic Loongson 3A3000 processor
During the test, we also paid attention to the single-core and multi-core performance of the processor. Recently, in the evaluation, the Zhaoxin KX6000 series processors concluded that the performance of KX6000 is equivalent to that of i5-7400 processors based on the test results of multi-threaded programs such as 7zip, but intentionally or unintentionally ignored that the KX6000 is an 8-core and 8-thread processor and the i5-7400 is a 4-core and 4-thread processor. If you look at single-core performance, then the KX6000 is roughly half of the i5-7400.
The following is a comparison of the basic situations of multiple processors in this evaluation.
IT Home Netizens Submission: A comprehensive dive into the performance details of the domestic Loongson 3A3000 processor
Intel i5-7200U
https://ark.intel.com/products/95443/Intel-Core-i5-7200U-Processor-3M-Cache-up-to-3_10-GHz
Intel J1900
Intel J1900
https://ark.intel.com/zh-cn/products/78867/Intel-Celeron-Processor-J1900-2M-Cache-up-to-2_42-GHz
Loongson 3A 3000
http://www.loongson.cn/product/cpu/3/3A3000.html
The test uses phronix-test-suite version 7.8.0.
http://www.phoronix-test-suite.com/?k=downloads
Tests were conducted on X270 laptops equipped with Intel i5-7200U processor and the self-assembled Loongson 3A3000 computer, among which the test of Loongson 3A3000 was carried out on the Debian testing system. The test environment is shown in Table 2:
IT Home Netizen Submission: A comprehensive and thorough understanding of the performance details of the domestic Loongnix 3A3000 processor
During the test, I discovered some test results that were suspected to have been conducted by Loongnix lab using the 1.5GHz Loongnix 3A 3000 processor. Therefore, I no longer test the performance of this processor using the Loongnix system alone.
For the compiler options used by Loongson 3A3000, you can refer to this link
for some applications. The n32 abi is used during compilation, that is, the parameter -mabi=n32 is added.
Since CPU performance is what I want to know the most, during the test, it is mainly concerned with testing that can reflect CPU performance, rather than considering performance testing of hardware such as disk, graphics card, memory, etc.
Introduction to test program and analysis of test results
Scientific calculation
1. scimark2
This test runs the ANSI C version of SimiMark 2.0, which is the benchmark for scientific and numerical calculations developed by programmers from the National Institute of Standards and Technology. The test consists of fast Fourier transform, Jacobian successive superslack, Monte Carlo, sparse matrix multiplication and dense LU matrix decomposition benchmarks. This test is a single-core performance test.
Test results:
IT Home Netizen submission: A comprehensive digging into the performance details of the domestic Loongson 3A3000 processor
Figure 5 Scimark2 performance comparison
The test results are shown in Figure 5. In the figure, the performance of Loongson 3A3000 (red) is referenced (1.0), the performance of i5-7200U (blue) and J1900 (green) are both compared with Loongson 3A3000. The higher the value, the better the performance. Given that Loongson 3A4000 is about to flow, according to news released by Loongson, the performance of Loongson 3A4000 has increased by 30% compared to the 3A3000's same main frequency, and the main frequency will be increased from 1.5GHz to 2.0GHz. In addition, Loongson 3A4000 will also add 256-bit SIMD instructions and increase the L3 cache from 8MB to 12MB. The score of SPEC CPU 2006 reaches 20 points, reaching twice that of Loongson 3A3000.
http://www.ict.cas.cn/kycg/cgnb/201709/P020170926639136974767.pdf
Therefore, we set the performance of 3A4000 to 2.0 as a reference for performance prediction.
From the test results in Figure 5, it can be seen that in Scimark's Monte Carlo test, the 3A3000 performance is very bad, only less than 10% of the i5-7200U, and even less than 30% of the J1900. This is very abnormal. There may be some floating point operation that does not use hardware floating point, but uses software-simulated floating point operation.In the remaining tests, the performance of the 3A3000 is comparable to that of the J1900, and some test performance is better than that of the J1900. Compared with the i5-7200U, although its main frequency is 1.99GHz, the single-core performance is only about 30% of that of the i5-7200U.
2.FFTE
FFTE is a package written by Daisuke Takahashi to calculate the discrete Fourier transform with sequence lengths of 1, 2 and 3 dimensions of (2^p)*(3^q)*(5^r). Single-core performance test.
Test results: Click here to access
3.fhourstones
Solution Connect-4 game to test the integer performance of the processor. Single-core performance test.
Test results: Click here to access
4. gmpbench
Performance tests performed using GMP 6.1.2 math library. Note that gmpbench only considers the integer performance of the program, not floating point performance. Single-core performance test. https://gmplib.org/gmpbench.html
Test results: Click here to access
5. minion
Minion is an open source constraint solver with design scalability. Single-threaded performance test.
https://constraintmodelling.org/minion/
Test results: Click here to access
6. mpcbench
GNU MPC is a C library for complex arithmetic. Single-threaded performance test.
https://openbenchmarking.org/result/1806164-FO-LOONGSON301
multichase
This is a benchmark of Google's multichase pointer chaser program.Single-threaded, multi-threaded performance test.
https://openbenchmarking.org/result/1806120-FO-LS3AMULTI28
IT Home Netizen submission: A comprehensive dive into the performance details of the domestic Loongson 3A3000 processor
Figure 6 FFTE, Fhourstone, Minion, Multichase performance test
In the above tests, we found two test data about J1900, among which the ffte performance is only 2/3 of the Loongson 3A3000, and the fhourstone performance is 1.1 times that of Loongson. From the overall performance perspective, the above performance tests of i5-7200U range from 2 to 4.5 times that of Loongson 3A3000, and are concentrated at about 2.3 times. We predict that the single-core performance of Loongson 3A4000 can reach about 85% of the i5-7200U in these tests. The performance of i5-7200U in gmpbench and mpcbench is about 4.5 times that of Loongson 3A3000 in tests. It has obvious advantages and may be related to the optimization of the mathematical library or compiler.
8 Bullett
Bullet physics engine. Bullet is an open source physics simulation computing engine, one of the three major physics simulation engines in the world. Single-threaded performance test.
https://openbenchmarking.org/result/1806126-FO-LS3ABULLE82
9. himeno
The Himeno benchmark is a linear solver of pressure Poisson using a point-Jacobi method.Single-threaded performance test.
https://openbenchmarking.org/result/1806127-FO-LS3A3000H21
10. tscp
This is a performance test of the simple chess program of TSCP, Tom Kerrigan, which has a built-in performance benchmark. Single-threaded performance test.
https://openbenchmarking.org/result/1806104-FO-LS3ATSCPD75
IT Home Netizen Submission: A comprehensive dive into the performance details of the domestic Loongson 3A3000 processor
Figure 7 Bullet, Himeno and TSCP test
In this set of tests, the advantages of i5-7200U are very obvious. In addition to 3.6 times the speed of 3A3000 on the chess performance test, the speed is basically more than 4 times that of Loongson 3A3000 in other tests, and the performance in the bullet's ragdoll test even reaches 30 times that of Loongson 3A3000. We analyzed the bullet's code and found that there is a large amount of SIMD-related code and assembly language optimization for X86 processors. This is an important reason why bullets run fast under Intel processors. For Ragdoll tests, we found that there are a large number of trigonometric function operations in the code, and Loongson's current trigonometric function calculation is problematic. It does not enable hardware floating point, but uses software simulation, so the speed is slow.
11. hpcg
High-performance conjugated gradient algorithm, a scientific benchmark program for supercomputing developed by Santia National Laboratory. Multi-threaded testing.
https://openbenchmarking.org/result/1806094-FO-LS3AHPCGD08
https://openbenchmarking.org/result/1806202-FO-LS3AHPCGO04
12. npb
NPB, NAS parallel benchmark, is the benchmark developed by NASA for high-end computer systems.This test configuration file currently uses the MPI version of NPB. Multi-threaded testing.
https://openbenchmarking.org/result/1806097-FO-LS3ANPBDE97
13. n-queens
OpenMP version of N-Queen problem solver. The problem size is 18. Multi-core performance testing.
https://openbenchmarking.org/result/1806109-FO-LS3ANQUEE29
14. mafft
Alignment of 100 pyruvate decarboxylase sequences. Multithreaded performance test.
https://openbenchmarking.org/result/1806108-FO-LS3AMAFFT56
15. primesieve
Primesieve uses the highly optimized Eratosthenes sieve method to generate prime numbers. Primesieve benchmarks CPU L1/L2 cache performance. Multithreaded performance test.
https://openbenchmarking.org/result/1806103-FO-LS3APRIME23
IT Home Netizen submission: A comprehensive dive into the performance details of the domestic Loongson 3A3000 processor
Figure 8 HCCG, NPB, N-Queens, MAFFT and Primesieve test
The above tests are all multi-threaded performance tests. In HPCG test, i5-7200U performance reaches 3.76 times that of Loongson 3A3000. In the NPB test, Loongson exceeded J1900 in three of them, and the other two were inferior to J1900. In the N-Queen problem, MAFFT and prime number screening tests, the performance of i5-7200U is about twice that of Loongson 3A3000; if the performance of Loongson 3A4000 can reach twice that of 3A3000, the performance of Loongson 3A4000 will be able to achieve the performance of i5-7200U in these tests. Again, this is a multi-threaded test!
Encryption Algorithm
16. Botan
Botan is a cross-platform C++ open source encryption library that supports most of all public encryption algorithms. (Single-threaded test)
https://openbenchmarking.org/result/1806093-FO-LS3ABOTAN50
Gnupg
Encrypt files with GnuPG, which takes time to count. Single-threaded performance test.
https://openbenchmarking.org/result/1806105-FO-LS3AGNUPG86
IT Home Netizen submission: A comprehensive dive into the performance details of the domestic Loongson 3A3000 processor
Figure 9 Botan and Gnupg test
In the Botan test, the gap between Loongson 3A3000 and i5-7200U in some projects is about 3 times. In the two tests of AES encryption and decryption, there is a performance difference of nearly 80 times from the i5-7200U! The reason is that the i5-7200U has AES encryption and decryption hardware implementation, which is very efficient, while the Loongson 3A3000 does not have such a function, or it cannot be used for the time being. In addition, there are also assembly optimizations for X86 in Botan, and the x86 processor has a very advantage in testing that can use these optimizations.
Multimedia encoding
18~22. encode-flac, encode-mp3, encode-ogg, encode-opus, encode-wavpack?
Transcode the audio file into flac, mp3, ogg, opus and wavpack to count the required time.
Test results: Click here to access
espeak
How long does this test take to use the espeak voice synthesis engine to read the Outline of Science of the Gutenberg project and output it to the WAV file. Single-threaded performance test.
https://openbenchmarking.org/result/1806148-FO-LS3AESPEA06
24. dcraw
Use DCRAW to convert multiple high-resolution RAW NEF image files to PPM image format and count the time required. Single-threaded performance test.
https://openbenchmarking.org/result/1806140-FO-LS3ADCRAW96
25. mencoder
This test uses mplayer's mencoder encoder program and LIVAVCODEC series to test the audio/video encoding of the system. Single-threaded performance test.
https://openbenchmarking.org/result/1806145-FO-LS3AMENCO12
26.Vpxenc
This is a standard video encoding performance test, using Google 's libvpx library and vpxenc command to implement VP8/WebM format encoding. Single-core performance test.
https://openbenchmarking.org/result/1806148-FO-LS3AVPXEN59
IT Home Netizen submission: A comprehensive dive into the performance details of domestic Loongson 3A3000 processor
Figure 10 Multimedia related performance test
In the test of audio and video compression encoding, Loongson once again failed the Intel processor, whether it is J1900 or i5-7200U. The reason is still optimization.The above multimedia applications have been optimized a lot for the x86 processor, but not the Loongson processor.
Compression algorithm
27. Compress-7ziph
Use the benchmark function provided by the 7zip program to test the multi-threading performance of the program.
https://openbenchmarking.org/result/1806036-FO-LOONGSON337
https://openbenchmarking.org/result/1806230-FO-LS3A7ZIPL48
28. Compress-gziph
Use the tar program to compress the Linux source code package and verify the single-threaded performance of the system's own gzip program.
https://openbenchmarking.org/result/1806039-FO-LS3A3000G15
Optimized gzip program
https://openbenchmarking.org/result/1806056-FO-LS3A3000G52
29. Compress-pbzip2
Use the parallel bzip2 algorithm to compress the Linux kernel source code package to count the required time. Multi-threaded program.
https://openbenchmarking.org/result/1806109-FO-LS3APBZIP29
Network application
30. Apache
Apache benchmark program, the evaluation standard is to issue 1 million requests, 100 concurrent, depending on how much the system can handle per second. Multithreaded performance test.
https://openbenchmarking.org/result/1806159-FO-LS3AAPACH45
31. ebizzy
Ebizzy test. Ebizzy can generate web server-like workloads.
https://openbenchmarking.org/result/1806152-FO-LS3AEBIZZ72
32. postmark
This is a POSTMARK benchmark test for NETAPP, designed to simulate small file tests similar to tasks undertaken by web and mail servers. This test configuration file will set POSTMARK to execute 25,000 transactions of 500 files simultaneously, with file sizes ranging from 5 to 512 kilobytes.
https://openbenchmarking.org/result/1806151-FO-LS3APOSTM75
IT Home Netizen submission: A comprehensive digging into the performance details of the domestic Loongson 3A3000 processor
Figure 11 Compression algorithm and network application test
From the above test results, it can be seen that in compression algorithm and network applications, the performance of Loongson 3A3000 and J1900 are close. Compared with the i5-7200U processor, the gap is about twice. It should be pointed out that except for gzip, the rest of the tests are multi-threaded tests.
Memory test
33. Cachebench
This is a performance test of Calebench, which is part of LLCBench. Cachebench is used to test memory and cache bandwidth performance.
https://openbenchmarking.org/result/1806034-FO-LS3A3000C27
34. stream
System memory (RAM) performance benchmark.
Test results: Click here to visit
IT Home Netizen submission: A comprehensive dive into the performance details of the domestic Loongson 3A3000 processor
Figure 12 Memory access performance test
In the stream test and CacheBench test, Loongson 3A3000 finally achieved its comprehensive advantage over J1900. In addition, in addition to the two data in Cachebench, if they are obviously equivalent to i5-7200U, Loongson 3A3000 and i5-7200U in other test contents. The reason why Loongson processors have had such good performance in history is that they have suffered greatly because of their poor memory access performance, and then they have spent a lot of effort to optimize memory access. We can expect Loongson 3A4000 to have better memory access performance.
Finally, based on the data of some FT1500A and FT-2000+ found on the openbenchmarking website, it was compared with Loongson 3A3000.
IT Home Netizen submission: A comprehensive and thorough understanding of the performance details of the domestic Loongson 3A3000 processor
(Click here to view the larger picture ▲)
Basically, the performance of Loongson 3A3000 is stronger than FT1500A, but is significantly weaker than FT-2000+. Compared with the FT1500A, the performance of FT-2000+ has been greatly improved, partly thanks to process improvements, from 28 nanometers to 16 nanometers; partly thanks to the update of the architecture. We predict that Loongson 3A4000 will be able to tie up or even surpass the Feiteng 2000+ processor in single-core performance while continuing to use the 28-nanometer process. Since Feiteng 2000+ processor has as many as 64 cores, Loongson still has a long way to go to catch up with Feiteng in multi-core performance.
3A3000's unexpected performance on Monte Carlo simulation is likely to be due to the lack of optimization for some key function.
Summary and Outlook
Looking at the development of Loongson from a vertical perspective, compared with Loongson 2F, the performance of Loongson 3A3000 has made great progress. In terms of process, the main frequency has been increased from 90nm of Loongson 2F to 28nm of Loongson 3A3000 processor; the main frequency has been increased from 800MHZ of Loongson 2F to 1.5GHz. In actual user applications, it can basically achieve smooth use. Compared with Intel processors, the comprehensive performance of Loongson 3A3000 is equivalent to that of Intel J1900 processors, and the single-core performance is equivalent to 30%~40% of that of Intel i5-7200U.
Through the 34 tests conducted in this article, we found that the root causes of poor performance of Loongson 3A3000:
has weak performance at the same main frequency. Judging from the performance of the same main frequency, Loongson 3A3000 has exceeded J1900, but only 60% to 70% of the Intel i5-7200U. It is expected that the performance of Loongson 3A4000 with the same main frequency will be improved by at least 30% in 2019.
The main frequency is too low. This is an unavoidable weakness that Loongson processors have made many enthusiasts feel resentful. Admittedly, the main frequency does not represent all performance, but it is absolutely impossible to have a low main frequency. The J1900's performance at the same main frequency is weaker than that of Loongson 3A3000, but because its main frequency can reach 1.99GHz and can also reach turbo to 2.4GHz, it also exceeds that of Loongson 3A3000 in many tests. The basic main frequency of the Intel i5-7200U reaches 2.5GHz, and the turbo frequency can reach 3.1GHz. The Feiteng 2000+ can reach 2.2GHz, while the KX-6000 of Zhaoxin can even reach 3.0GHz. Feiteng and Zhaoxin processors may be weaker than Loongson in terms of performance at the same main frequency, but they can still defeat Loongson 3A3000 with higher main frequency.
One of the reasons why Loongson's main frequency is the backward process process, which is still using the 28nm process, while Intel, Feiteng, Zhaoxin, etc. are already using the 14nm process. According to Loongson's development plan, by 2020, Loongson will use the 14nm process to cut the Loongson 3C5000, and the main frequency can reach 2.5GHz.
The system software optimization is insufficient. In the test, the problems we found were that mathematical functions such as trigonometric functions were too slow to operate. It seems that some hardware floating-point operations were not applied, and Loongson lacked an optimized mathematical function library. In terms of encryption and decryption instructions, AES hardware implementation is missing. In the test, we found that various tests using the Debian operating system, GCC 7.3 and 1.4GHz Loongnix operating system, GCC 4.9 compiler, and 1.5GHz Loongnix 3A3000 were basically better than the combination of using the Loongnix operating system, GCC 4.9 compiler, and 1.5GHz Loongnix 3A3000. We believe that the optimization of the compiler is very important to the performance of Loongson. During the test, we also found that using the 4.14 Linux kernel will have a considerable performance improvement compared to the 3.10 Linux kernel, and Loongson still lacks an optimized Linux kernel.
The application software is not optimized enough. Because the MIPS architecture lacks a software ecosystem, various application software lacks optimization for the MIPS architecture. The specific manifestation is that many software have assembly optimization for X86 system. To establish the Loongson ecosystem and give full play to the performance of Loongson processors, optimization of the same level is indispensable.
With the future optimization of Loongson's architecture and the improvement of main frequency, the bottleneck that affects Loongson's development will not be the performance of the processor, but the construction of the software ecosystem, that is, system software optimization and application software optimization. Among them, the optimization of various application software will be a shortcut to improving Loongson's user experience. In fact, Loongson has also realized these problems and proposed to learn from Apple's "app by app, feature by feature, pixel by pixel".
At present, the chip work of Loongson 3A4000 is underway, and it is expected that the chip will be visible by early 2019. Before the 3A4000 appears, we made a prediction on the performance of the 3A4000. Based on our evaluation, we believe that the performance of the same main frequency of 3A4000 will be improved from 60% to 70% of i5-7200U to 80% to 90%, the single-core performance at 2.0GHz will reach 2/3 of i5-7200U, and the multi-threading performance will exceed i5-7200U. Compared with other domestic CPUs, the performance of Loongson 3A4000's same main frequency will exceed that of Feiteng and Zixin, and the single-core performance will also exceed that of Feiteng 2000+. However, due to the 28 nm process behind the 3A4000 and the still low main frequency (2.0 GHz), the comprehensive performance of Loongson 3A4000 may still not exceed that of Ziteng KX-6000 with a main frequency of 3.0GHz. If the Zhaoxin KX-6000 cannot be mass-produced and launched in 2019, the Loongson 3A4000 may still become the domestic independent processor with the strongest single-core performance that can be purchased in China in 2019.
The gap between Loongson processor and Intel and AMD's high-performance processors is still very huge, and Loongson still has a long way to go. We look forward to Loongson adopting better processes and more optimized microarchitecture in the future, and also hope that Loongson can perform better in system software support such as compilers, mathematics libraries, and operating systems, and build a better application software ecosystem. We look forward to Loongson 3A4000, 3B 4000, and Loongson 3C 5000 to successfully stream the chip as soon as possible.
The above review is just a personal work by an ordinary enthusiast who is not a computer major. It is not authoritative, has limited levels, has a hurry, has a lot of data, and errors and omissions are inevitable. Please criticize and correct me.
End my evaluation with Chairman Mao’s words:
"We are moving forward. We are doing an extremely glorious and great cause that our predecessors have never done. Our goal must be achieved. Our goal must be achieved."
Appendix
Summary of various test results
Loongson 3A 3000 (Loongnix):
https://openbenchmarking.org/result/1806113-TR-LSLABSLS380
https://openbenchmarking.org/result/1709288-TR-LOONGSON390
https://openbenchmarking.org/result/1709288-TR-LOONGSON390
FT1500A :
https://openbenchmarking.org/result/1705187-KH-CPUSCIMAR08
I5-7200u
https://openbenchmarking.org/result/1806175-FO-I57200UDE18
https://openbenchmarking.org/result/1806174-FO-I57200UMU24
https://openbenchmarking.org/result/1806175-FO-I57200URA38
https://openbenchmarking.org/result/1806176-FO-I57200UCO93
https://openbenchmarking.org/result/1806179-FO-I57200UCR30
J1900
https://openbenchmarking.org/result/1404256-PL-1404206PL73
https://openbenchmarking.org/result/1404250-PL-1404206SO61
https://openbenchmarking.org/result/1404268-PL-J1900MULT15
https://openbenchmarking.org/result/1404272-PL-J1900SPEE11
https://openbenchmarking.org/result/1404275-PL-J1900PROC21
This test configuration file currently uses the MPI version of NPB. Multi-threaded testing.https://openbenchmarking.org/result/1806097-FO-LS3ANPBDE97
13. n-queens
OpenMP version of N-Queen problem solver. The problem size is 18. Multi-core performance testing.
https://openbenchmarking.org/result/1806109-FO-LS3ANQUEE29
14. mafft
Alignment of 100 pyruvate decarboxylase sequences. Multithreaded performance test.
https://openbenchmarking.org/result/1806108-FO-LS3AMAFFT56
15. primesieve
Primesieve uses the highly optimized Eratosthenes sieve method to generate prime numbers. Primesieve benchmarks CPU L1/L2 cache performance. Multithreaded performance test.
https://openbenchmarking.org/result/1806103-FO-LS3APRIME23
IT Home Netizen submission: A comprehensive dive into the performance details of the domestic Loongson 3A3000 processor
Figure 8 HCCG, NPB, N-Queens, MAFFT and Primesieve test
The above tests are all multi-threaded performance tests. In HPCG test, i5-7200U performance reaches 3.76 times that of Loongson 3A3000. In the NPB test, Loongson exceeded J1900 in three of them, and the other two were inferior to J1900. In the N-Queen problem, MAFFT and prime number screening tests, the performance of i5-7200U is about twice that of Loongson 3A3000; if the performance of Loongson 3A4000 can reach twice that of 3A3000, the performance of Loongson 3A4000 will be able to achieve the performance of i5-7200U in these tests. Again, this is a multi-threaded test!
Encryption Algorithm
16. Botan
Botan is a cross-platform C++ open source encryption library that supports most of all public encryption algorithms. (Single-threaded test)
https://openbenchmarking.org/result/1806093-FO-LS3ABOTAN50
Gnupg
Encrypt files with GnuPG, which takes time to count. Single-threaded performance test.
https://openbenchmarking.org/result/1806105-FO-LS3AGNUPG86
IT Home Netizen submission: A comprehensive dive into the performance details of the domestic Loongson 3A3000 processor
Figure 9 Botan and Gnupg test
In the Botan test, the gap between Loongson 3A3000 and i5-7200U in some projects is about 3 times. In the two tests of AES encryption and decryption, there is a performance difference of nearly 80 times from the i5-7200U! The reason is that the i5-7200U has AES encryption and decryption hardware implementation, which is very efficient, while the Loongson 3A3000 does not have such a function, or it cannot be used for the time being. In addition, there are also assembly optimizations for X86 in Botan, and the x86 processor has a very advantage in testing that can use these optimizations.
Multimedia encoding
18~22. encode-flac, encode-mp3, encode-ogg, encode-opus, encode-wavpack?
Transcode the audio file into flac, mp3, ogg, opus and wavpack to count the required time.
Test results: Click here to access
espeak
How long does this test take to use the espeak voice synthesis engine to read the Outline of Science of the Gutenberg project and output it to the WAV file. Single-threaded performance test.
https://openbenchmarking.org/result/1806148-FO-LS3AESPEA06
24. dcraw
Use DCRAW to convert multiple high-resolution RAW NEF image files to PPM image format and count the time required. Single-threaded performance test.
https://openbenchmarking.org/result/1806140-FO-LS3ADCRAW96
25. mencoder
This test uses mplayer's mencoder encoder program and LIVAVCODEC series to test the audio/video encoding of the system. Single-threaded performance test.
https://openbenchmarking.org/result/1806145-FO-LS3AMENCO12
26.Vpxenc
This is a standard video encoding performance test, using Google 's libvpx library and vpxenc command to implement VP8/WebM format encoding. Single-core performance test.
https://openbenchmarking.org/result/1806148-FO-LS3AVPXEN59
IT Home Netizen submission: A comprehensive dive into the performance details of domestic Loongson 3A3000 processor
Figure 10 Multimedia related performance test
In the test of audio and video compression encoding, Loongson once again failed the Intel processor, whether it is J1900 or i5-7200U. The reason is still optimization.The above multimedia applications have been optimized a lot for the x86 processor, but not the Loongson processor.
Compression algorithm
27. Compress-7ziph
Use the benchmark function provided by the 7zip program to test the multi-threading performance of the program.
https://openbenchmarking.org/result/1806036-FO-LOONGSON337
https://openbenchmarking.org/result/1806230-FO-LS3A7ZIPL48
28. Compress-gziph
Use the tar program to compress the Linux source code package and verify the single-threaded performance of the system's own gzip program.
https://openbenchmarking.org/result/1806039-FO-LS3A3000G15
Optimized gzip program
https://openbenchmarking.org/result/1806056-FO-LS3A3000G52
29. Compress-pbzip2
Use the parallel bzip2 algorithm to compress the Linux kernel source code package to count the required time. Multi-threaded program.
https://openbenchmarking.org/result/1806109-FO-LS3APBZIP29
Network application
30. Apache
Apache benchmark program, the evaluation standard is to issue 1 million requests, 100 concurrent, depending on how much the system can handle per second. Multithreaded performance test.
https://openbenchmarking.org/result/1806159-FO-LS3AAPACH45
31. ebizzy
Ebizzy test. Ebizzy can generate web server-like workloads.
https://openbenchmarking.org/result/1806152-FO-LS3AEBIZZ72
32. postmark
This is a POSTMARK benchmark test for NETAPP, designed to simulate small file tests similar to tasks undertaken by web and mail servers. This test configuration file will set POSTMARK to execute 25,000 transactions of 500 files simultaneously, with file sizes ranging from 5 to 512 kilobytes.
https://openbenchmarking.org/result/1806151-FO-LS3APOSTM75
IT Home Netizen submission: A comprehensive digging into the performance details of the domestic Loongson 3A3000 processor
Figure 11 Compression algorithm and network application test
From the above test results, it can be seen that in compression algorithm and network applications, the performance of Loongson 3A3000 and J1900 are close. Compared with the i5-7200U processor, the gap is about twice. It should be pointed out that except for gzip, the rest of the tests are multi-threaded tests.
Memory test
33. Cachebench
This is a performance test of Calebench, which is part of LLCBench. Cachebench is used to test memory and cache bandwidth performance.
https://openbenchmarking.org/result/1806034-FO-LS3A3000C27
34. stream
System memory (RAM) performance benchmark.
Test results: Click here to visit
IT Home Netizen submission: A comprehensive dive into the performance details of the domestic Loongson 3A3000 processor
Figure 12 Memory access performance test
In the stream test and CacheBench test, Loongson 3A3000 finally achieved its comprehensive advantage over J1900. In addition, in addition to the two data in Cachebench, if they are obviously equivalent to i5-7200U, Loongson 3A3000 and i5-7200U in other test contents. The reason why Loongson processors have had such good performance in history is that they have suffered greatly because of their poor memory access performance, and then they have spent a lot of effort to optimize memory access. We can expect Loongson 3A4000 to have better memory access performance.
Finally, based on the data of some FT1500A and FT-2000+ found on the openbenchmarking website, it was compared with Loongson 3A3000.
IT Home Netizen submission: A comprehensive and thorough understanding of the performance details of the domestic Loongson 3A3000 processor
(Click here to view the larger picture ▲)
Basically, the performance of Loongson 3A3000 is stronger than FT1500A, but is significantly weaker than FT-2000+. Compared with the FT1500A, the performance of FT-2000+ has been greatly improved, partly thanks to process improvements, from 28 nanometers to 16 nanometers; partly thanks to the update of the architecture. We predict that Loongson 3A4000 will be able to tie up or even surpass the Feiteng 2000+ processor in single-core performance while continuing to use the 28-nanometer process. Since Feiteng 2000+ processor has as many as 64 cores, Loongson still has a long way to go to catch up with Feiteng in multi-core performance.
3A3000's unexpected performance on Monte Carlo simulation is likely to be due to the lack of optimization for some key function.
Summary and Outlook
Looking at the development of Loongson from a vertical perspective, compared with Loongson 2F, the performance of Loongson 3A3000 has made great progress. In terms of process, the main frequency has been increased from 90nm of Loongson 2F to 28nm of Loongson 3A3000 processor; the main frequency has been increased from 800MHZ of Loongson 2F to 1.5GHz. In actual user applications, it can basically achieve smooth use. Compared with Intel processors, the comprehensive performance of Loongson 3A3000 is equivalent to that of Intel J1900 processors, and the single-core performance is equivalent to 30%~40% of that of Intel i5-7200U.
Through the 34 tests conducted in this article, we found that the root causes of poor performance of Loongson 3A3000:
has weak performance at the same main frequency. Judging from the performance of the same main frequency, Loongson 3A3000 has exceeded J1900, but only 60% to 70% of the Intel i5-7200U. It is expected that the performance of Loongson 3A4000 with the same main frequency will be improved by at least 30% in 2019.
The main frequency is too low. This is an unavoidable weakness that Loongson processors have made many enthusiasts feel resentful. Admittedly, the main frequency does not represent all performance, but it is absolutely impossible to have a low main frequency. The J1900's performance at the same main frequency is weaker than that of Loongson 3A3000, but because its main frequency can reach 1.99GHz and can also reach turbo to 2.4GHz, it also exceeds that of Loongson 3A3000 in many tests. The basic main frequency of the Intel i5-7200U reaches 2.5GHz, and the turbo frequency can reach 3.1GHz. The Feiteng 2000+ can reach 2.2GHz, while the KX-6000 of Zhaoxin can even reach 3.0GHz. Feiteng and Zhaoxin processors may be weaker than Loongson in terms of performance at the same main frequency, but they can still defeat Loongson 3A3000 with higher main frequency.
One of the reasons why Loongson's main frequency is the backward process process, which is still using the 28nm process, while Intel, Feiteng, Zhaoxin, etc. are already using the 14nm process. According to Loongson's development plan, by 2020, Loongson will use the 14nm process to cut the Loongson 3C5000, and the main frequency can reach 2.5GHz.
The system software optimization is insufficient. In the test, the problems we found were that mathematical functions such as trigonometric functions were too slow to operate. It seems that some hardware floating-point operations were not applied, and Loongson lacked an optimized mathematical function library. In terms of encryption and decryption instructions, AES hardware implementation is missing. In the test, we found that various tests using the Debian operating system, GCC 7.3 and 1.4GHz Loongnix operating system, GCC 4.9 compiler, and 1.5GHz Loongnix 3A3000 were basically better than the combination of using the Loongnix operating system, GCC 4.9 compiler, and 1.5GHz Loongnix 3A3000. We believe that the optimization of the compiler is very important to the performance of Loongson. During the test, we also found that using the 4.14 Linux kernel will have a considerable performance improvement compared to the 3.10 Linux kernel, and Loongson still lacks an optimized Linux kernel.
The application software is not optimized enough. Because the MIPS architecture lacks a software ecosystem, various application software lacks optimization for the MIPS architecture. The specific manifestation is that many software have assembly optimization for X86 system. To establish the Loongson ecosystem and give full play to the performance of Loongson processors, optimization of the same level is indispensable.
With the future optimization of Loongson's architecture and the improvement of main frequency, the bottleneck that affects Loongson's development will not be the performance of the processor, but the construction of the software ecosystem, that is, system software optimization and application software optimization. Among them, the optimization of various application software will be a shortcut to improving Loongson's user experience. In fact, Loongson has also realized these problems and proposed to learn from Apple's "app by app, feature by feature, pixel by pixel".
At present, the chip work of Loongson 3A4000 is underway, and it is expected that the chip will be visible by early 2019. Before the 3A4000 appears, we made a prediction on the performance of the 3A4000. Based on our evaluation, we believe that the performance of the same main frequency of 3A4000 will be improved from 60% to 70% of i5-7200U to 80% to 90%, the single-core performance at 2.0GHz will reach 2/3 of i5-7200U, and the multi-threading performance will exceed i5-7200U. Compared with other domestic CPUs, the performance of Loongson 3A4000's same main frequency will exceed that of Feiteng and Zixin, and the single-core performance will also exceed that of Feiteng 2000+. However, due to the 28 nm process behind the 3A4000 and the still low main frequency (2.0 GHz), the comprehensive performance of Loongson 3A4000 may still not exceed that of Ziteng KX-6000 with a main frequency of 3.0GHz. If the Zhaoxin KX-6000 cannot be mass-produced and launched in 2019, the Loongson 3A4000 may still become the domestic independent processor with the strongest single-core performance that can be purchased in China in 2019.
The gap between Loongson processor and Intel and AMD's high-performance processors is still very huge, and Loongson still has a long way to go. We look forward to Loongson adopting better processes and more optimized microarchitecture in the future, and also hope that Loongson can perform better in system software support such as compilers, mathematics libraries, and operating systems, and build a better application software ecosystem. We look forward to Loongson 3A4000, 3B 4000, and Loongson 3C 5000 to successfully stream the chip as soon as possible.
The above review is just a personal work by an ordinary enthusiast who is not a computer major. It is not authoritative, has limited levels, has a hurry, has a lot of data, and errors and omissions are inevitable. Please criticize and correct me.
End my evaluation with Chairman Mao’s words:
"We are moving forward. We are doing an extremely glorious and great cause that our predecessors have never done. Our goal must be achieved. Our goal must be achieved."
Appendix
Summary of various test results
Loongson 3A 3000 (Loongnix):
https://openbenchmarking.org/result/1806113-TR-LSLABSLS380
https://openbenchmarking.org/result/1709288-TR-LOONGSON390
https://openbenchmarking.org/result/1709288-TR-LOONGSON390
FT1500A :
https://openbenchmarking.org/result/1705187-KH-CPUSCIMAR08
I5-7200u
https://openbenchmarking.org/result/1806175-FO-I57200UDE18
https://openbenchmarking.org/result/1806174-FO-I57200UMU24
https://openbenchmarking.org/result/1806175-FO-I57200URA38
https://openbenchmarking.org/result/1806176-FO-I57200UCO93
https://openbenchmarking.org/result/1806179-FO-I57200UCR30
J1900
https://openbenchmarking.org/result/1404256-PL-1404206PL73
https://openbenchmarking.org/result/1404250-PL-1404206SO61
https://openbenchmarking.org/result/1404268-PL-J1900MULT15
https://openbenchmarking.org/result/1404272-PL-J1900SPEE11
https://openbenchmarking.org/result/1404275-PL-J1900PROC21