It combines deep neural networks with reinforcement learning for the first time, achieving end-to-end learning from perception to action, and reaching superhuman levels in a variety of Atari games.

2025/05/0903:51:35 hotcomm 1696

Fish and Sheep from Aofei Temple
qubit Report | Official Account QbitAI

In the hot summer, it is hard to bear the heat. Why not learn to learn deep learning and calm down?

Here is a tutorial on how to get started with Deep Reiforcement Learning step by step. There are all background theory and code implementations, and you don’t need to install it online!

Without further ado, hurry up and get the tutorial to see what treasure knowledge is available in it~

Step by step into RL

This Pytorch reinforcement learning tutorial has eight chapters, starting with DQN (Deep Q-Learning), step into step into it, and finally showing you what Rainbow is.

not only has Jupyter Notebook, but the author also configured the code on Colab. Without installing it, you can intuitively feel the effect of the algorithm, and you can even learn directly on your phone!

1. DQN

DeepRL The first step is to understand DQN (Deep Q-Learning) . This is an algorithm proposed by DeepMind, which was launched on Nuture in 2015. It combines deep neural networks with reinforcement learning for the first time, achieving end-to-end learning from perception to action, and reaching superhuman levels in a variety of Atari games.

Pytorch Jupyter Notebook:
https://nbviewer.jupyter.org/github/Curt-Park/rainbow-is-all-you-need/blob/master/01.dqn.ipynb
Colab:
https://colab.research.google.co m/github/Curt-Park/rainbow-is-all-you-need/blob/master/01.dqn.ipynb#scrollTo=nEcnUNg8Sn3I

△Colab Online training

2. Double DQN

Double DQN (DDQN) is an improvement of DQN. Before DDQN, basically all target Q values were obtained through greed method, which often caused overestimations. DDQN decomposes the maximum action of the target Q value into two steps: action selection and action evaluation, which effectively solves this problem.

Pytorch Jupyter Notebook:
https://nbviewer.jupyter.org/github/Curt-Park/rainbow-is-all-you-need/blob/master/02.double_q.ipynb
Colab:
https://colab.research.google.com/github/Curt-Park/rainbow-is-all-you-need/blob/master/02.double_q.ipynb

3.Prioritized Experience Replay

The core of this algorithm is to introduce the concept of priority when extracting past experience samples from the experience pool. That is, the size of priority will affect the probability that the sample will be sampled.

uses this method, and the probability of important experience being played back will increase, the algorithm will be more likely to converge, and the learning efficiency will be improved accordingly.

Pytorch Jupyter Notebook:
https://nbviewer.jupyter.org/github/Curt-Park/rainbow-is-all-you-need/blob/master/03.per.ipynb
Colab:
https://colab.research.google.com/github/Curt-Park/rainbow-is-all-you-need/blob/master/03.per.ipynb

4. Dueling Networks

Dueling DQN optimizes the algorithm by optimizing the structure of the neural network. Dueling Networks uses two sub- networks for estimating the state value and the advantages of each action respectively.

Pytorch Jupyter Notebook:
https://nbviewer.jupyter.org/github/Curt-Park/rainbow-is-all-you-need/blob/master/04.dueling.ipynb
Colab:
https://colab.research.google.com/github/Curt-Park/rainbow-is-all-you-need/blob/master/04.dueling.ipynb

5. Noisy Network

NoisyNet promotes exploration by learning the perturbation of network weights. The key is that a single change to a weight vector can trigger consistent, potentially very complex, state-related policy changes in multiple time steps.

Pytorch Jupyter Notebook:
https://nbviewer.jupyter.org/github/Curt-Park/rainbow-is-all-you-need/blob/master/05.noisy_net.ipynb
Colab:
https://colab.research.google.com/github/Curt-Park/rainbow-is-all-you-need/blob/master/05.noisy_net.ipynb

6. Categorical DQN (C51)

Categorical DQN is an algorithm designed using a distribution perspective. It models the distribution of state-action value Q, so that the learning results will be more accurate.

Pytorch Jupyter Notebook:
https://nbviewer.jupyter.org/github/Curt-Park/rainbow-is-all-you-need/blob/master/06.categorical_dqn.ipynb
Colab:
https://colab.research.google.com/github/Curt-Park/rainbow-is-all-you-need/blob/master/06.categorical_dqn.ipynb

7. N-step Learning

DQN uses the current instant reward and the value estimate for the next moment as the target value, and the learning speed may be relatively slow. And using front-view multi-step goals is actually feasible. N-step Learning speeds up learning by adjusting multi-step target n.

Pytorch Jupyter Notebook:
https://nbviewer.jupyter.org/github/Curt-Park/rainbow-is-all-you-need/blob/master/07.n_step_learning.ipynb
Colab:
https://colab.research.google.com/github/Curt-Park/rainbow-is-all-you-need/blob/master/07.n_step_learning.ipynb

8. Rainbow

With the preparation of the first seven chapters, you can now understand the true meaning of Rainbow.

Rainbow is a new algorithm that combines multiple DQN expansion algorithms. It has shown amazing results in data efficiency and final performance.

However, integration is not an easy thing, and the tutorial also discusses this.

Pytorch Jupyter Notebook:
https://nbviewer.jupyter.org/github/Curt-Park/rainbow-is-all-you-need/blob/master/08.rainbow.ipynb
Colab:
https://colab.research.google.com/github/Curt-Park/rainbow-is-all-you-need/blob/master/08.rainbow.ipynb#scrollTo=ougv5VEKX1d1

System learning is a very good choice. Of course, the author also said that you can also choose where you want to learn the above knowledge points.

Learning tips

If you want to run these codes locally, please take some tips here.

First of all, the running environment:

$ conda create -n rainbow_is_all_you_need python=3.6.1$ conda activate rainbow_is_all_you_need

Enter the installation process. First, clon the repository:

Secondly, install the packages required to execute the code. This is very simple:

make dep

Then, start learning~

—End —

Sincerely recruiting

Quantum bits are recruiting editors/reporters, and their work location is in Zhongguancun, Beijing. Looking forward to talented and enthusiastic classmates joining us! For related details, please reply to the word "recruitment" in the QbitAI dialogue interface.

qubit QbitAI · Toutiao number signed by

��'ᴗ' �� Tracking new trends in AI technology and products