Photo by Paul Skorupskas on Unsplash [Image [0]]
How to train your neural network
This blog post will take you through the different types of CNN operations in PyTorch. In this blog post, we will use torch.nn to implement 1D and 2D convolution.
What is CNN?
Convolutional Neural Network is a kind of neural network, mainly used in image processing applications. Other applications of CNN are in sequential data, such as audio, time series and NLP. Convolution is one of the main building blocks of CNN. The term convolution refers to the mathematical combination of two functions to produce a third function. It merges two sets of information.
Here we will not discuss too many theories. There are many great materials to choose from online. The type of
CNN operation
CNN is mainly used for applications around image, audio, video, text and time series modeling. There are 3 types of convolution operations.
filter with one-dimensional input slides along a single size to produce an output. The image below is from this Stackoverflow answer.
1D Convolution for 1D Input [Image [1]]
2D input 1D convolution
1D Convolution for 2D Input [Image [2]]
2D Convolution for 1D Input [Image [Image [1]]2D Answers to get more information about different types of CNN operations. several key terms
explain the terms of 2D convolution and 2D input. Picture, because I can't find the relevant visualization of 1D convolution.
Convolution operation
In order to calculate the output size after convolution operation, we can use the following formula.

Convolution Output Formula [Image [4]]
The kernel/filter slides on the input signal as shown in the figure below. You can see that the filter (green square) slides on our input (blue square) and the sum of the convolution enters the feature map (red square).

Convolution Operation [Image [5]]
filter/kernel
uses a filter to perform convolution on the input image. The output of convolution is called a feature map.

Filter [Image [6]]
In CNN terms, the 3×3 matrix is called "filter" or "kernel" or "feature detector", and the matrix formed by sliding the filter on the image and calculating the dot product is called For "convolution features" or "activate" maps" or "feature maps". It is important to note that the filter acts as a feature detector for the original input image.
more filters = more features map = more features. The
filter is nothing but a matrix of numbers. The following are different types of filters-

Different types of filters [Image [7]]
stride
stride is specified in eachHow much to move the convolution filter in each step.

Stride of 1 [Image [8]]
If we want the overlap to be reduced, we can have a larger stride. Since we skipped potential locations, this also makes the generated feature maps smaller. The figure below demonstrates a stride of 2. Note that the feature map becomes smaller.

Stride of 2 [Image [9]]
padding
Here, we keep more information from the border and keep the size of the image.

Padding [Image [10]]
We see that the size of the feature map is smaller than the input because the convolution filter needs to be included in the input. If you want to keep the same size, you can use padding to surround the input with zeros.
Pooling
We apply pooling to reduce the dimensionality.

Max Pooling [Image [11]]
import library
import numpy as npimport torchimport torch.nn as nnimport torch.optim as optimfrom torch.utils.data import Dataset, DataLoader
input data
. .
input_1d is a one-dimensional floating-point tensor. input_2d is a two-dimensional floating point tensor. input_2d_img is a 3-dimensional floating point tensor representing an image.
input_1d = torch.tensor([1, 2, 3, 4, 5, 6, 7, 8, 9, 10], dtype = torch.float)input_2d = torch.tensor([[1, 2, 3, 4, 5], [6, 7, 8, 9, 10]], dtype = torch.float)input_2d_img = torch.tensor([[[1, 2, 3, 4, 5, 6, 7, 8, 9, 10 ], [1, 2, 3, 4, 5, 6, 7, 8, 9, 10], [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]], [[1 , 2, 3, 4, 5, 6, 7, 8, 9, 10], [1, 2, 3, 4, 5, 6, 7, 8, 9, 10], [1, 2, 3, 4 , 5, 6, 7, 8, 9, 10]], [[1, 2, 3, 4, 5, 6, 7, 8, 9, 10], [1, 2, 3, 4, 5, 6 , 7, 8, 9, 10], [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]]], dtype = torch.float) OUTPUT Input 1D:input_1d.shape: torch. Size([10])input_1d: tensor([ 1., 2., 3., 4., 5., 6., 7., 8., 9., 10.])======== ============================================================Input 2D:input_2d. shape: torch.Size([2, 5])input_2d: tensor([[ 1., 2., 3., 4., 5.], [6., 7., 8., 9., 10.] ])=============================================== ====================input_2d_img:input_2d_img.shape: torch.Size([3, 3, 10])input_2d_img: tensor([[[ 1., 2. , 3., 4., 5., 6., 7., 8., 9., 10.], [1., 2., 3., 4., 5., 6., 7., 8. , 9., 10.], [1., 2., 3., 4., 5., 6., 7., 8., 9., 10.]], [[ 1., 2., 3 ., 4., 5., 6., 7., 8., 9., 10.], [1., 2., 3., 4., 5., 6., 7., 8., 9 ., 10.], [1., 2., 3., 4., 5., 6., 7., 8., 9., 10.]], [[ 1., 2., 3., 4., 5., 6., 7., 8., 9., 10.], [1., 2., 3., 4., 5., 6., 7., 8., 9., 10.], [1., 2., 3., 4., 5., 6., 7., 8., 9., 10.]]])
one-dimensional convolution
nn.Conv1d() on the input Apply one-dimensional convolution. nn.Conv1d() expects the input to be [batch_size, input_channels, signal_length] shape.
You can view the complete parameter list in the official PyTorch documentation. The required parameter is-
Conv1d-input 1d
several key terms
explain the terms of 2D convolution and 2D input. Picture, because I can't find the relevant visualization of 1D convolution.
Convolution operation
In order to calculate the output size after convolution operation, we can use the following formula.
Convolution Output Formula [Image [4]]
The kernel/filter slides on the input signal as shown in the figure below. You can see that the filter (green square) slides on our input (blue square) and the sum of the convolution enters the feature map (red square).
Convolution Operation [Image [5]]
filter/kernel
uses a filter to perform convolution on the input image. The output of convolution is called a feature map.
Filter [Image [6]]
In CNN terms, the 3×3 matrix is called "filter" or "kernel" or "feature detector", and the matrix formed by sliding the filter on the image and calculating the dot product is called For "convolution features" or "activate" maps" or "feature maps". It is important to note that the filter acts as a feature detector for the original input image.
more filters = more features map = more features. The
filter is nothing but a matrix of numbers. The following are different types of filters-
Different types of filters [Image [7]]
stride
stride is specified in eachHow much to move the convolution filter in each step.
Stride of 1 [Image [8]]
If we want the overlap to be reduced, we can have a larger stride. Since we skipped potential locations, this also makes the generated feature maps smaller. The figure below demonstrates a stride of 2. Note that the feature map becomes smaller.
Stride of 2 [Image [9]]
padding
Here, we keep more information from the border and keep the size of the image.
Padding [Image [10]]
We see that the size of the feature map is smaller than the input because the convolution filter needs to be included in the input. If you want to keep the same size, you can use padding to surround the input with zeros.
Pooling
We apply pooling to reduce the dimensionality.
Max Pooling [Image [11]]
import numpy as npimport torchimport torch.nn as nnimport torch.optim as optimfrom torch.utils.data import Dataset, DataLoader
input_1d = torch.tensor([1, 2, 3, 4, 5, 6, 7, 8, 9, 10], dtype = torch.float)input_2d = torch.tensor([[1, 2, 3, 4, 5], [6, 7, 8, 9, 10]], dtype = torch.float)input_2d_img = torch.tensor([[[1, 2, 3, 4, 5, 6, 7, 8, 9, 10 ], [1, 2, 3, 4, 5, 6, 7, 8, 9, 10], [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]], [[1 , 2, 3, 4, 5, 6, 7, 8, 9, 10], [1, 2, 3, 4, 5, 6, 7, 8, 9, 10], [1, 2, 3, 4 , 5, 6, 7, 8, 9, 10]], [[1, 2, 3, 4, 5, 6, 7, 8, 9, 10], [1, 2, 3, 4, 5, 6 , 7, 8, 9, 10], [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]]], dtype = torch.float) OUTPUT Input 1D:input_1d.shape: torch. Size([10])input_1d: tensor([ 1., 2., 3., 4., 5., 6., 7., 8., 9., 10.])======== ============================================================Input 2D:input_2d. shape: torch.Size([2, 5])input_2d: tensor([[ 1., 2., 3., 4., 5.], [6., 7., 8., 9., 10.] ])=============================================== ====================input_2d_img:input_2d_img.shape: torch.Size([3, 3, 10])input_2d_img: tensor([[[ 1., 2. , 3., 4., 5., 6., 7., 8., 9., 10.], [1., 2., 3., 4., 5., 6., 7., 8. , 9., 10.], [1., 2., 3., 4., 5., 6., 7., 8., 9., 10.]], [[ 1., 2., 3 ., 4., 5., 6., 7., 8., 9., 10.], [1., 2., 3., 4., 5., 6., 7., 8., 9 ., 10.], [1., 2., 3., 4., 5., 6., 7., 8., 9., 10.]], [[ 1., 2., 3., 4., 5., 6., 7., 8., 9., 10.], [1., 2., 3., 4., 5., 6., 7., 8., 9., 10.], [1., 2., 3., 4., 5., 6., 7., 8., 9., 10.]]])
Conv1d-Input1d Example [Image [12]]
The input is a one-dimensional signal consisting of 10 numbers. We convert it to a tensor of size [1, 1, 10].
input_1d = input_1d.unsqueeze(0).unsqueeze(0)input_1d.shape OUTPUT torch.Size([1, 1, 10])
CNN output, where out_channels = 1, kernel_size = 3, and stride = 1.
cnn1d_1 = nn.Conv1d(in_channels=1, out_channels=1, kernel_size=3, stride=1)print("cnn1d_1: \n")print(cnn1d_1(input_1d).shape, "\n")print(cnn1d_1(input_1d)) OUTPUT cnn1d_1 : torch.Size([1, 1, 8]) tensor([[[-1.2353, -1.4051, -1.5749, -1.7447, -1.9145, -2.0843, -2.2541, -2.4239]]], grad_fn=)
CNN Output, where out_channels = 1, kernel_size = 3 and stride = 2.
cnn1d_2 = nn.Conv1d(in_channels=1, out_channels=1, kernel_size=3, stride=2)print("cnn1d_2: \n")print(cnn1d_2(input_1d).shape, "\n")print(cnn1d_2(input_1d )) OUTPUT cnn1d_2: torch.Size([1, 1, 4]) tensor([[[0.5107, 0.3528, 0.1948, 0.0368]]], grad_fn=)
CNN output, where out_channels = 1, kernel_size = 2 and stride = 1.
cnn1d_3 = nn.Conv1d(in_channels=1, out_channels=1, kernel_size=2, stride=1)print("cnn1d_3: \n")print(cnn1d_3(input_1d).shape, "\n")print(cnn1d_3(input_1d )) OUTPUT cnn1d_3: torch.Size([1, 1, 9]) tensor([[[0.0978, 0.2221, 0.3465, 0.4708, 0.5952, 0.7196, 0.8439, 0.9683, 1.0926]]], grad_fn=)
CNN output, Where out_channels = 5, kernel_size = 3, stride = 2.
cnn1d_4 = nn.Conv1d(in_channels=1, out_channels=5, kernel_size=3, stride=1)print("cnn1d_4: \n")print(cnn1d_4(input_1d).shape, "\n")print(cnn1d_4(input_1d )) OUTPUT cnn1d_4: torch.Size([1, 5, 8]) tensor([[[-1.8410e+00, -2.8884e+00, -3.9358e+00, -4.9832e+00, -6.0307e+ 00,-7.0781e+00, -8.1255e+00, -9.1729e+00], [-4.6073e-02, -3.4436e-02, -2.2799e-02, -1.1162e-02, 4.7439e-04 ,1.2111e-02, 2.3748e-02, 3.5385e-02], [-1.5541e+00, -1.8505e+00, -2.1469e+00, -2.4433e+00, -2.7397e+00, -3.0361e+00, -3.3325e+00, -3.6289e +00], [6.6593e-01, 1.2362e+00, 1.8066e+00, 2.3769e+00, 2.9472e+00, 3.5175e+00, 4.0878e+00, 4.6581e+00], [2.0414e- 01, 6.0421e-01, 1.0043e+00, 1.4044e+00, 1.8044e+00, 2.2045e+00, 2.6046e+00, 3.0046e+00])), grad_fn=)
Conv1d-Enter 2d
1D convolution is applied to 2d input signal, we can perform the following operations. First, we define an input tensor of size [1, 2, 5], where batch_size = 1, input_channels = 2 and signal_length = 5.
input_2d = input_2d.unsqueeze(0)input_2d.shape OUTPUT torch.Size([1, 2, 5])
CNN output in_channels = 2, out_channels = 1, kernel_size = 3, stride = 1.
cnn1d_5 = nn.Conv1d(in_channels=2, out_channels=1, kernel_size=3, stride=1)print("cnn1d_5: \n")print(cnn1d_5(input_2d).shape, "\n")print(cnn1d_5(input_2d )) OUTPUT cnn1d_5: torch.Size([1, 1, 3]) tensor([[[-6.6836, -7.6893, -8.6950]]], grad_fn=)
CNN output in_channels = 2, out_channels = 1, kernel_size = 3. Stride = 2.
cnn1d_6 = nn.Conv1d(in_channels=2, out_channels=1, kernel_size=3, stride=2)print("cnn1d_6: \n")print(cnn1d_6(input_2d).shape, "\n")print(cnn1d_6(input_2d )) OUTPUT cnn1d_6: torch.Size([1, 1, 2]) tensor([[[-3.4744, -3.7142]]], grad_fn=)
CNN output, where in_channels = 2, out_channels = 1, kernel_size = 2 , Stride = 1.
cnn1d_7 = nn.Conv1d(in_channels=2, out_channels=1, kernel_size=2, stride=1)print("cnn1d_7: \n")print(cnn1d_7(input_2d).shape, "\n")print(cnn1d_7(input_2d)) OUTPUT cnn1d_7: torch.Size([1, 1, 4]) tensor([[[0.5619, 0.6910, 0.8201, 0.9492]]], grad_fn =)
CNN output, where in_channels = 2, out_channels = 5, kernel_size = 3, stride = 1.
cnn1d_8 = nn.Conv1d(in_channels=2, out_channels=5, kernel_size=3, stride=1)print("cnn1d_8: \n")print(cnn1d_8(input_2d).shape, "\n")print(cnn1d_8(input_2d )) OUTPUT cnn1d_8: torch.Size([1, 5, 3]) tensor([[[ 1.5024, 2.4199, 3.3373], [0.2980, -0.0873, -0.4727], [1.5443, 1.7086, 1.8729], [2.6177, 3.2974, 3.9772], [-2.5145, -2.2906, -2.0668]]], grad_fn=)
2D convolution
nn.Conv2d() applies 2D convolution on the input. nn.Conv2d() expects the input shape to be [batch_size, input_channels, input_height, input_width].
You can view the complete parameter list in the official PyTorch documentation. The required parameter is —
Convolution with 3 channels [Image [13] credits]
To apply 2D convolution to a 2d input signal (such as an image), we can do the following . First, we define an input tensor of size [1, 3, 3, 10], where batch_size = 1, input_channels = 3, input_height = 3, and input_width = 10.
input_2d_img = input_2d_img.unsqueeze(0)input_2d_img.shape OUTPUT torch.Size([1, 3, 3, 10])
CNN output in_channels = 3, out_channels = 1, kernel_size = 3, stride = 1.
cnn2d_1 = nn.Conv2d(in_channels=3, out_channels=1, kernel_size=3, stride=1)print("cnn2d_1: \n")print(cnn2d_1(input_2d_img).shape, "\n")print(cnn2d_1(input_2d_img)) OUTPUT cnn2d_1: torch.Size([1, 1, 1, 8]) tensor([[[[-1.0716, -1.5742, -2.0768, -2.5793, -3.0819, -3.5844, -4.0870,-4.5896]]]], grad_fn=)
CNN output in_channels = 3, out_channels = 1, kernel_size = 3, stride = 2.
cnn2d_2 = nn.Conv2d(in_channels=3, out_channels=1, kernel_size=3, stride=2)print("cnn2d_2: \n")print(cnn2d_2(input_2d_img).shape, "\n")print(cnn2d_2(input_2d) )) OUTPUT cnn2d_2: torch.Size([1, 1, 1, 4]) tensor([[[[-0.7407, -1.2801, -1.8195, -2.3590]]]], grad_fn=)
CNN output, where in_channels = 3, out_channels = 1, kernel_size = 2, stride = 1.
cnn2d_3 = nn.Conv2d(in_channels=3, out_channels=1, kernel_size=2, stride=1)print("cnn2d_3: \n")print(cnn2d_3(input_2d_img).shape, "\n")print(cnn2d_3(input_2d_img )) OUTPUT cnn2d_3: torch.Size([1, 1, 2, 9]) tensor([[[[-0.8046, -1.5066, -2.2086, -2.9107, -3.6127, -4.3147, -5.0167, -5.7188,- 6.4208], [-0.8046, -1.5066, -2.2086, -2.9107, -3.6127, -4.3147, -5.0167,-5.7188, -6.4208]]]], grad_fn=)
CNN output, where in_channels = 3, out_channels = 5 , Kernel_size = 3, stride = 1.
cnn2d_4 = nn.Conv2d(in_channels=3, out_channels=5, kernel_size=3, stride=1)print("cnn2d_4: \n")print(cnn2d_4(input_2d_img).shape, "\n")print(cnn2d_4(input_2d) )) OUTPUT cnn2d_4: torch.Size([1, 5, 1, 8]) tensor([[[[-2.0868e+00, -2.7669e+00, -3.4470e+00, -4.1271e+00,- 4.8072e+00, -5.4873e+00, -6.1673e+00, -6.8474e+00]],[[-4.5052e-01, -5.5917e-01, -6.6783e-01, -7.7648e-01, -8.8514e-01, -9.9380e-01, -1.1025e+00, -1.2111e+00] ], [[ 6.6228e-01, 8.3826e-01, 1.0142e+00, 1.1902e+00, 1.3662e+00, 1.5422e+00, 1.7181e+00, 1.8941e+00]], [[-5.4425 e-01, -1.2149e+00, -1.8855e+00, -2.5561e+00, -3.2267e+00, -3.8973e+00, -4.5679e+00, -5.2385e+00]], [[ 2.0564e-01, 1.6357e-01, 1.2150e-01, 7.9434e-02, 3.7365e-02, -4.7036e-03, -4.6773e-02, -8.8842e-02]])], grad_fn= )
Thank you for reading. Suggestions and constructive criticism are welcome. :) You can find me on LinkedIn. You can view the complete code here. Check the Github repository here and star if you like.
(This article is translated from Akshaj Verma's article "[Pytorch Basics] How to train your Neural Net — Intro to CNN", reference: https://towardsdatascience.com/pytorch-basics-how-to-train-your-neural- net-intro-to-cnn-26a14c2ea29)