When working on machine learning projects, especially when dealing with deep learning and neural networks, it is better to work with a GPU rather than a CPU because even a very basic GPU will outperform a CPU when it comes to neural networks.

2025/10/2219:54:38 technology 1736

When working on machine learning projects, especially when dealing with deep learning and neural networks , it is better to work with GPU rather than CPU because even a very basic GPU will outperform a CPU when it comes to neural networks.

But which GPU should you buy? This article will summarize the relevant factors to consider so that you can make an informed choice based on your budget and specific modeling requirements.

Why is GPU more suitable for machine learning than CPU? The

CPU ( central processing unit ) is the main workhorse of the computer. It is very flexible. It not only needs to process instructions from various programs and hardware, but also has certain processing speed requirements. To perform well in this multitasking environment, a CPU has a small number of flexible and fast processing units (also called cores).

GPU (Graphics Processing Unit) GPU is not that flexible when it comes to multitasking. But it can perform large amounts of complex mathematical calculations in parallel. This is achieved by having a larger number of simple cores (thousands to tens of thousands) that can handle many simple calculations simultaneously. The requirement of

to perform multiple calculations in parallel is well suited for:

graphics rendering - moving graphics objects need to continuously calculate their trajectories, which requires a large number of parallel mathematical calculations that are constantly repeated.
machine and deep learning - a lot of matrix/tensor calculations, GPU can process in parallel.
Any kind of mathematical calculation can be split to run in parallel.

has summarized the main differences between CPU and GPU on Nvidia's own blog:

Tensor Processing Unit (TPU)

With the development of artificial intelligence and machine/deep learning, there are now more specialized processing cores called tensor cores. They are faster and more efficient when performing tensor/matrix calculations. Because the data type we deal with in machine/deep learning is tensors. Although

has a dedicated TPU, some of the latest GPUs also include many tensor cores, which we will summarize later.

Nvidia vs AMD

This will be a fairly short section, as the answer to this question is definitely Nvidia

While it is possible to use AMD's GPUs for machine/deep learning, at the time of writing, Nvidia's GPUs have greater compatibility and are generally better integrated into tools like TensorFlow and PyTorch (such as the current AMD for PyTorch GPU support is only available on Linux).

requires the use of additional tools (ROCm) to use AMD GPUs. This will require some additional work, and the version may not be updated quickly. This situation may improve in the future, but for now, it's better to stick with Nvidia. Key Attributes for

GPU Selection

Choosing a GPU that is adequate for your machine learning tasks and fits your budget basically comes down to a balance of four main factors: How much memory does the

GPU have? How many CUDA and/or tensor cores does the
GPU have? What chip architecture does the
card use? What are the
power requirements (if any)?

I will explore each of these aspects below, hoping to give you a better understanding of what is important to you.

GPU memory

The answer is, the more the better!

It really depends on your task, and how big these models are. For example, if you are processing images, video, or audio, then by definition you will be processing a fairly large amount of data, and GPU RAM will be a very important consideration.

always has a way to solve the problem of out of memory (such as reducing the batch size). But this will waste training time, so the needs need to be balanced well.

Based on experience, my suggestions are as follows:

4GB: I think this is the absolute minimum, as long as you are not dealing with overly complex models, or large images, videos or audios, this will work in most cases, but it will not be enough for daily use.If you are just starting out and want to try it out without going all in, you can start with

8GB: it is a good start for daily learning and can complete most tasks without exceeding the limits of RAM, but you will encounter problems when working with more complex image, video or audio models.

12GB: I think this is the most basic requirement for scientific research. Can handle most larger models, even those working with images, video or audio.

12GB+: The more the better, you will be able to handle larger data sets and larger batch sizes. Above 12GB is where prices really start to rise.

Generally speaking, if the cost is the same, you are better off choosing a "slower" card with more memory. Keep in mind that the advantage of GPUs is high throughput, which relies heavily on available RAM to transfer data through the GPU.

CUDA core and Tensor core

This is actually very simple, the more the better.

considers RAM first, then CUDA. For machine/deep learning, Tensor cores are better (faster, more efficient) than CUDA cores. This is because they are precisely designed for the computations required in the machine/deep learning field.

But that doesn't matter because the CUDA kernel is already fast enough. If you can get a card that includes Tensor cores, that's a great plus, just don't get too hung up on it. After

you will see "CUDA" mentioned many times, let's summarize it first:

CUDA cores - these are the physical processors on the graphics card, there are usually thousands, and the 4090 has 16,000.

CUDA 11 - Numbers may change, but this refers to the software/drivers installed to allow the graphics card to function properly. NV releases new versions regularly, and it can be installed and updated like any other software.

CUDA generation (or compute power) - This describes the codename of the graphics card in it's newer iteration. This is fixed on the hardware, so can only be changed by upgrading to a new card. It is distinguished by numbers and a code name. Example: 3. x[Kepler],5. x[Maxwell], 6. x [Pascal], 7. x[Turing] and 8. x(Ampere).

chip architecture

This is actually more important than you think. We are not discussing AMD here, I only have "old yellow" in my eyes.

As we have said above, the 30 series cards are Ampere architecture, and the latest 40 series is Ada Lovelace. Usually Huang will use a famous scientist and mathematician to name the architecture. This time he chose Ada Lovelace, the daughter of the famous British poet Byron, a female mathematician who established the concept of loops and subroutines, and the founder of the computer program .

understands the computing power of the card, we need to understand 2 aspects:

Significant functional improvements

An important feature here is, mixed precision training:

There are many benefits to using a number format with a precision lower than 32-bit floating point numbers. First they require less memory, allowing for the training and deployment of larger neural networks. Secondly they require less memory bandwidth, thus speeding up data transfer operations. Third math operations run faster with reduced precision, especially on GPUs with Tensor Cores. Mixed-precision training achieves all of these benefits while ensuring no loss of task-specific accuracy compared to full-precision training. It does this by identifying steps that require full precision and using 32-bit floating point only for those steps and 16-bit floating point everywhere else.

Here is the official Nvidia document. If you are interested, you can take a look:

https://docs.nvidia.com/deeplearning/performance/mixed-precision-training/index.html

It is only possible to use mixed precision training if your GPU has a 7.x (Turing) or higher architecture. That means RTX 20 series or higher on the desktop, or "T" or "A" series on the server. The main reason why

mixed precision training has such advantages is that it reduces RAM usage. Tensor Core's GPU will accelerate mixed precision training. If not, using FP16 will also save video memory, and can train larger batch sizes, indirectly improving training speed.Will

be deprecated?

If you have particularly high requirements for RAM, but don't have enough money to buy a high-end card, then you may choose an older GPU on the second-hand market. There's a pretty big downside to this...the card's life is over. A typical example of

is the Tesla K80, which has 4992 CUDA cores and 24GB of RAM. In 2014, it retailed for about $7,000. The current price ranges from 150 to 170 US dollars! (The price of salted fish is around 600-700) You must be very excited to have such a large memory at such a small price.

But there is a very big problem. The computing architecture of K80 is 3.7 (Kepler), which is no longer supported starting from CUDA 11 (the current CUDA version is 11.7). This means that the card has expired, which is why it is sold so cheaply.

Therefore, when choosing a second-hand card, be sure to check whether it supports the latest version of the driver and CUDA. This is the most important thing.

high-end game card VS workstation/server card

Lao Huang basically divided the card into two parts. Consumer graphics cards and workstation/server graphics cards (i.e. professional graphics cards).

There is a clear difference between the two parts, for the same specs (RAM, CUDA cores, architecture) consumer graphics cards are usually cheaper. But professional cards usually have better quality and lower energy consumption (in fact, the noise of turbine is quite loud, which is fine when placed in a computer room, but a bit noisy when placed at home or in a laboratory).

High-end (very expensive) pro cards, you may notice that they have a lot of RAM (e.g. RTX A6000 has 48GB, A100 has 80GB!). This is because they typically target the 3D modeling, rendering, and machine/deep learning professional markets, which require high levels of RAM. Again, if you have money, just buy A100! (H100 is a new version of A100 and cannot be evaluated at present)

But I personally think that we should choose consumer high-end game cards, because if you are not short of money, you will not read this article, right

Selection recommendations

So at the end I make some suggestions based on budget and needs. I've divided it into three parts:

low budget
medium budget
high budget

high budget does not consider anything beyond high-end consumer graphics cards. Again, if you have money: buy A100 or H100.

This article will contain cards purchased on the second-hand market. This is mainly because I think second hand is something to consider if you are on a low budget. The Professional Desktop Series cards (T600, A2000, and A4000) are also included here because some of its configurations are slightly worse than similar consumer graphics cards, but the power consumption is significantly better.

Low budget

Medium budget

High budget

Online/cloud service

If you decide that spending money on a graphics card isn't for you, you can take advantage of Google Colab, which lets you use the GPU for free.

but this is time limited, if you use the GPU for too long they will kick you out and go back to the CPU. It will also take the GPU back if it is inactive for too long, possibly while you are writing code. The GPU is also automatically allocated, so you can't choose the exact GPU you want (you can also get a Colab Pro for $9.9 per month, which I personally think is much better than low-budget, but requires a ladder, and the $49.99 Colab Pro+ is a bit pricey and not recommended).

At the time of writing, the following GPUs are available through Colab:

As mentioned earlier, the K80 has 24GB of RAM and 4992 CUDA cores, which is basically two K40 cards linked together. This means that when you use the K80 in Colab, you actually have access to half the card, so that's only 12GB and 2496 CUDA cores.

summary

Finally, the 4090 is still in a state of monkey play. Basically, you have to rush to buy it or find scalpers at a higher price.

But 16384 CUDA + 24GB, compared to the 10496 CUDA of the 3090, is really delicious.

and 4080 16G 9728CUDA, if the price can reach within 7000, it should be a very cost-effective choice. Don't consider the 12G 4080, it doesn't deserve the name.

should also be a good choice for AMD's 7900XTX, but compatibility is a big issue. If anyone tests it, you can leave a message.

40 series Lao Huang has been playing tricks, so if you are not in a hurry, you can wait a little longer:

If you don’t buy it, I won’t buy it, and tomorrow it will be reduced by 200