Pytorch Lightning Complete Walkthrough

2022/03/0821:14:26 hotcomm 2244

author | Takanashi@zhihu (authorized)

source | https://zhuanlan.zhihu.com/p/353985363

editor | pole city platform

This article has been authorized by the author , please contact the original work for reprinting

written in front

Pytorch-Lightning I have "discovered" this library twice. When I first discovered it, I felt that it was heavy and difficult to learn, and I didn't seem to be able to use it myself. However, some slightly higher-level requirements began to appear with the projects I was working on. I found that I always spent a lot of time on similar engineering codes. Debugging was also the most time spent on these codes, and a contradiction gradually emerged. : If you want more and better features like TensorBoard support, Early Stop, LR Scheduler, distributed training, quick testing, etc., the code will inevitably get longer and look more messy, At the same time, the core training logic is gradually overwritten by these engineering codes. So is there a better solution that can even solve all these problems with one click?

So I discovered Pytorch-Lightning for the second time.

Really fragrant.

But the problem still comes. The framework is not made easier to learn by incense. The tutorials on the official website are very rich, and it can be seen that the developers are working hard. However, many connected knowledge points are distributed in different sections, and some core understanding points have not been emphasized, but have been mentioned in small print, which makes me want to make a inclusive tutorial, including all the Concepts considered important in the learning process,Easy-to-use parameters, some attention points, pits, a large number of sample code segments and a concentrated explanation of some core issues.

Finally, the third part provides a template that I have summarized that is easy to use for large projects, easy to migrate, and easy to reuse. If you are interested, you can go to GitHub — https://github.com/miracleyoo/pytorch- Lightning-template trial.

core

A great feature of Pytorch-Lighting is that it separates the model from the system. The model is a pure model like Resnet18, RNN, and the system defines how a set of models interact with each other, such as GAN (Generator Network and Discriminator Network), Seq2Seq (Encoder and Decoder Network) and Bert. At the same time, sometimes the problem involves only one model, then the system can be a general system that describes how the model is used and can be reused in many other projects.
The core design philosophy of Pytorch-Lighting is "self-sufficiency". Each network also includes how to train, how to test, and optimizer definitions.

Recommended method

This part is placed at the front, because the full text is too long, it is easy to ignore this part of the essence if you put it later.

Pytorch-Lightning is a good library, or rather an abstraction and wrapper for pytorch. Its advantages are strong reusability, easy maintenance, and clear logic. The disadvantage is also obvious. There is still a lot of content to learn and understand in this package, or in other words,very heavy. If you write the code directly according to the official template, the small project is fine. If it is a large project, there are multiple models and data sets that need to be debugged and verified, it will not be easy to handle, or even more troublesome. After a few days of exploration and debugging, I have concluded the following set of useful templates, which can also be said to be a further abstraction of Pytorch-Lightning.

Welcome everyone to try this set of code styles. If you are used to it, it is quite easy to reuse, and it is not easy to retreat halfway.

  root- |-data |-__init__.py |-data_interface.py |-xxxdataset1.py |-xxxdataset2.py |-... |-model |-__init__.py |-model_interface.py |-xxxmodel1 .py |-xxxmodel2.py |-... |-main.py

If you directly upload plmodule to each model, the conversion of existing projects, other people's code, etc. will be quite time-consuming. Also, in this case, you need to add some similar code to each model, like training_step , validation_step . Obviously, this is not what we want,If you do, not only is it not easy to maintain, but it may be more messy. Similarly, if each dataset class is directly converted into pl's DataModule, it will face similar problems. Based on this consideration, I recommend using the above architecture:

Put only one main.py file in the main directory.
data and modle Put __init__.py into two folders. This makes it easy to import. Two init file are: from .data_interface import DInterface and from .model_interface import MInterface
in data_interface in Create a class DInterface(pl.LightningDataModule): is used as the interface for all dataset files. __init __ () Dataset class corresponding to the import function, setup () instantiated and added as needed honestly train_dataloader , val_dataloader , test_dataloader function. These functions tend to be all similar, with several input args controlling different parts.
Similarly, class MInterface(pl.LightningModule): class is established in model_interface as the intermediate interface of the model. __init__() import the corresponding model class in the function,Then honestly add configure_optimizers , training_step , validation_step and other functions to control all model functions with an interface class. Different parts are controlled using input parameters.
main.py function is only responsible for: the definition parser, parse add item; selecting the required callback function; instantiating MInterface , DInterface , Trainer .

Done.

full template can be found on GitHub: https://github.com/miracleyoo/pytorch-lightning-template.

Lightning Module

Introduction

Homepage: https://pytorch-lightning.readthedocs.io/en/latest/common/lightning_module.html

Three core components:

pan model ulli li40ul

Optimizer

Train/Val/Test step

Data flow pseudo code:

  outs = []for batch in data: out = training_step(batch) outs. append(out)training_epoch_end(outs)

Equivalent Lightning code:

  def training_step(self, batch, batch_idx): prediction = ... return predictiondef training_epoch_end(self, training_step_outputs): for prediction in predictions: # do something with these

Just like filling in the blanks, fill in these functions.

components and functions

API page: https://pytorch-lightning.readthedocs.io/en/latest/common/lightning_module.html%23lightningmodule-api

A Pytorch-Lighting model must contain The parts are:

init : initialization, including model and system definitions.
training_step(self, batch, batch_idx) : The processing function of each batch.

parameters:
batch ( Tensor | ( Tensor , ...) | [ Tensor , ...]) -. The output of your DataLoader A tensor, tuple or list
batch_idx ( int ) -. Integer displaying index of this batch
optimizer_idx ( int ) – When us . Ing multiple optimizers, this argument will also be present
hiddens ( Tensor ) - Passed in if truncated_bptt_steps> 0.

Return Value: Any of

Tensor - The loss tensor
dict -. A dictionary Can include any keys, but must include the key 'loss'
None - Training will skip to the next batch

The return value needs to have a loss anyway.If it is a dictionary, this key is required. The batch is skipped without loss. Example:

  def training_step(self, batch, batch_idx): x, y, z = batch out = self.encoder(x) loss = self.loss(out, x) return loss# Multiple optimizers (eg: GANs )def training_step(self, batch, batch_idx, optimizer_idx): if optimizer_idx == 0: # do training_step with encoder if optimizer_idx == 1: # do training_step with decoder # Truncated back-propagation through timedef training_step(self, batch, batch_idx, hiddens): # hiddens are the hidden states from the previous truncated backprop step ... out, hiddens = self.lstm(data, hiddens) ... return {'loss': loss, 'hiddens': hiddens}

configure_optimizers : optimizer definitions,Returns one optimizer, or several optimizers, or two Lists (optimizers, Scheduler). Such as:

  # most casesdef configure_optimizers(self): opt = Adam(self.parameters(), lr=1e-3) return opt# multiple optimizer case (eg: GAN)def configure_optimizers(self): generator_opt = Adam (self.model_gen.parameters(), lr=0.01) disriminator_opt = Adam(self.model_disc.parameters(), lr=0.02) return generator_opt, disriminator_opt# example with learning rate schedulersdef configure_optimizers(self): generator_opt = Adam(self. model_gen.parameters(), lr=0.01) disriminator_opt = Adam(self.model_disc.parameters(), lr=0.02) discriminator_sched = CosineAnnealing(discriminator_opt, T_max=10) return [generator_opt, disriminator_opt], [discriminator_sched]# example with step -based learning rate schedulersdef configure_optimizers(self): gen_opt = Adam(self.model_gen.parameters(), lr=0.01) dis_opt = Adam(self.model_disc.parameters(), lr=0.02) gen_sched = {'scheduler': ExponentialLR (gen_opt, 0.99), 'interval': 'ste p'} # called after each training step dis_sched = CosineAnnealing(discriminator_opt, T_max=10) # called every epoch return [gen_opt, dis_opt], [gen_sched, dis_sched]# example with optimizer frequencies# see training procedure in `Improved Training of Wasserstein GANs`, Algorithm 1# https://arxiv.org/abs/1704.00028def configure_optimizers(self): gen_opt = Adam(self.model_gen.parameters(), lr=0.01) dis_opt = Adam(self.model_disc.parameters() , lr=0.02) n_critic = 5 return ( {'optimizer': dis_opt, 'frequency': n_critic}, {'optimizer': gen_opt, 'frequency': 1} )

The components that can be specified are:

forward : same as normal nn.Module ,for inference. When called internally: y=self(batch)
training_step_end : Use this function only when training with multiple nodes and the result involves steps such as softmax that require joint operations on all outputs . Similarly, validation_step_end / test_step_end .
training_epoch_end : Called at the end of a training epoch; input parameter: a List, the content of the List is the content of each time returned by the previous training_step() ; return: None
validation_step(self, batch, batch_idx) / test_step(self, batch, batch_idx) : no return value limit,It is not necessary to output a val_loss .
validation_epoch_end / test_epoch_end

tool functions are:

freeze : freeze all weights for the prediction time use. Only used when the training has been completed and only later is tested.
print : Although the built-in print function can also be used, if the program runs in a distributed system, it will print multiple times. And using self.print() will only print once.
log : log loggers such as TensorBoard,For each log scalar, there will be a corresponding abscissa, which may be a batch number or an epoch number. And on_step means that the abscissa of the log amount is expressed as the current batch, and on_epoch means that the log amount is accumulated over the entire epoch and then logged The coordinates are the current epoch.

LightningMoule Hook

on_step

on_epoch

prog_bar

logger

* also applies to the test loop

parameters:
name ( str ) – key name
_span2sp an value ( Any ) - value name
prog_bar ( bool ) - if True logs to the progress bar logger
( bool ) - if True logs to the logger
on_step ( Optional [ bool ]) - if True logs at this step. None auto-logs at the training_step but not validation/test_step
on_epoch _strong34 1strong ( Optional [ bool ]) -. If True logs epoch accumulated metrics None auto-logs the val / test step but not training_step
reduce_fx at ( Callable ) - reduction function over step values for end of epoch Torch.mean by default
tbptt_reduce_fx ( Callable ) -. function to reduce on truncated back prop
tbptt_pad_token ( int ) – token to use for padding
strong338st rong enable_graph ( bool ) - if True, will not auto detach the graph
sync_dist ( bool ) - if True, reduces the metric across GPUs / TPUs
sync_dist_op ( Union [ Any , str ]) - the op to sync across GPUs / TPUs
sync_dist_group ( Optional [ Any _span2s pan ]) – the ddp group

log_dict : The only difference between the log function is, name and value variables are replaced by a dictionary. Indicates that multiple values are logged at the same time. For example: python values = {'loss': loss, 'acc': acc, ..., 'metric_n': metric_n} self.log_dict(values)
save_hyperparameters : save_ span2span init All hyperparameters entered in . Subsequent accesses can be made by means of self.hparams.argX . At the same time, the hyperparameter table will also be saved to the file.

function built-in variables:

device : You can use self.device to build a device-independent tensor.For example: z = torch.rand(2, 3, device=self.device) .
hparams : Contains all previously saved input hyperparameters.
precision : precision. 32 and 16 are common.

points

If you plan to use DataParallel, in writing training_step when you need to call forward function, z = self (x)

template

  class LitModel(pl.LightningModule): def __init__(...): def forward(...): def training_step(...) def training_step_end(...) def training_epoch_end(...) def validation_step(... ) def validation_step_end(...) def validation_epoch_end(...) def test_step(...) def test_step_end(...) def test_epoch_end(...) def configure_optimizers(...) def any_extra_hook(...)

Trainer

base use

  model = MyLightningModule() trainer = Trainer() trainer.fit(model, train_dataloader, val_dataloader)

if connected 2 validation_stepspan n1span has none,That val_dataloader is fine.

pseudocode and hooks

Hooks page: https://pytorch-lightning.readthedocs.io/en/latest/common/lightning_module.html%23hooks

  def fit(...): on_fit_start( ) if global_rank == 0: # prepare data is called on GLOBAL_ZERO only prepare_data() for gpu/tpu in gpu/tpus: train_on_device(model.copy()) on_fit_end()def train_on_device(model): # setup is called PER DEVICE setup() configure_optimizers() on_pretrain_routine_start() for epoch in epochs: train_loop() teardown() def train_loop(): on_train_epoch_start() train_outs = [] for train_batch in train_dataloader(): on_train_batch_start() # ----- train_step methods ------- out = training_step(batch) train_outs.append(out) loss = out.loss backward() on_after_backward() optimizer_step() on_before_zero_grad() optimizer_zero_grad() on_train_batc h_end(out) if should_check_val: val_loop() # end training epoch logs = training_epoch_end(outs)def val_loop(): model.eval() torch.set_grad_enabled(False) on_validation_epoch_start() val_outs = [] for val_batch in val_dataloader(): on_validation_batch_start() # -------- val step methods ------- out = validation_step(val_batch) val_outs.append(out) on_validation_batch_end(out) validation_epoch_end(val_outs) on_validation_epoch_end() # set up for train model.train() torch.set_grad_enabled(True)

recommended parameters

parameter introduction (with video) — https://pytorch-lightning.readthedocs.io/en/latest/common/trainer.html%23trainer -flags

class definition and default parameters — https://pytorch-lightning.readthedocs.io/en/latest/common/trainer.html%23trainer-class-api

default_root_dir : default storage address.All experimental variables and weights will be stored in this folder. The recommendation is to have a separate folder for each model. Each retraining results in a new version_x subfolder.

max_epochs : Maximum number of training epochs. trainer = Trainer(max_epochs=1000)

min_epochs : Minimum number of training epochs. Used when there is an Early Stop.

auto_scale_batch_size : Automatically select an appropriate batch size before doing any training.

  # default used by the Trainer (no scaling of batch size) trainer = Trainer(auto_scale_batch_size=None)# run batch size scaling, result overrides hparams.batch_size trainer = Trainer(auto_scale_batch_size='binsearch')# call tune to find the batch sizetrainer.tune(model)

auto_select_gpus : Automatically select the appropriate GPU.Especially useful when the GPU is in exclusive mode.

auto_lr_find : Automatically find a suitable initial learning rate. Techniques from the https://arxiv.org/abs/1506.01186 paper were used. Works if and only if the trainer.tune(model) code is executed.

  # run learning rate finder, results override hparams.learning_ratetrainer = Trainer(auto_lr_find=True)# run learning rate finder, results override hparams.my_lr_argtrainer = Trainer(auto_lr_find='my_lr_arg')# call tune to find the lrtrainer. tune(model)

precision : precision. The normal value is 32. Using 16 can reduce memory consumption and increase batch.

  # default used by the Trainertrainer = Trainer(precision=32)# 16-bit precisiontrainer = Trainer(precision=16, gpus=1)

val_check_interval : Period for Validation test.Normal is 1, training 1 epoch and testing 4 times is 0.25, and testing every 1000 batches is 1000.

use (float) to check within a training epoch: At this time, this value is a percentage of an epoch. How many tests per percent. use (int) to check every n steps (batches): Test every n steps.

  # default used by the Trainertrainer = Trainer(val_check_interval=1.0)# check validation set 4 times during a training epochtrainer = Trainer(val_check_interval=0.25)# check validation set every 1000 training batches# use this when using iterableDataset and your dataset has no length# (ie: production cases with streaming data) trainer = Trainer(val_check_interval=1000)

gpus : Controls the number of GPUs used. When set to None, the cpu is used.

  # default used by the Trainer (ie: train on CPU) trainer = Trainer(gpus=None)# equivalenttrainer = Trainer(gpus=0)# int: train on 2 gpustrainer = Trainer(gpus=2)# list : train on GPUs 1, 4 (by bus ordering) trainer = Trainer(gpus=[1, 4]) trainer = Trainer(gpus='1, 4') # equivalent# -1: train on all gpustrainer = Trainer(gpus =-1)trainer = Trainer(gpus='-1') # equivalent# combine with num_nodes to train on multiple GPUs across nodes# uses 8 gpus in total trainer = Trainer(gpus=2, num_nodes=4)# train only on GPUs 1 and 4 across nodes trainer = Trainer(gpus=[1, 4], num_nodes=4)

limit_train_batches : Percentage of training data used.Use this if you have too much data, or are debugging. The range of values is 0~1. Similarly, there are limit_test_batches , limit_val_batches .

  # default used by the Trainertrainer = Trainer(limit_train_batches=1.0)# run through only 25% of the training set each epochtrainer = Trainer(limit_train_batches=0.25)# run through only 10 batches of the training set each epochtrainer = Trainer (limit_train_batches=10)

fast_dev_run : bool amount. If set to true, only one batch of train, val and test will be executed, and then end. For debugging only.

Setting this argument will disable tuner, checkpoint callbacks, early stopping callbacks, early stopping callbacks, loggers and logger callbacks like LearningRateLogger and runs for only 1 epoch

pre (fast_dev_run=False)# runs 1 train, val, test batch and program ends trainer = Trainer(fast_dev_run=True)# runs 7 train, val, test batches and program ends trainer = Trainer(fast_dev_run=7)

.fit() Function

Trainer.fit(model, train_dataloader=None, val_dataloaders=None, datamodule=None) : The first quantity of input must be model,It can then be followed by a LintningDataModule or a normal Train DataLoader. If Val step is defined, also Val DataLoader.

parameters:
datamodule ([Optional] [LightningDataModule]) - A instance of LightningDataModule
model [LightningModule] - Model to fit
train_dataloader.. ([Optional] [DataLoader]) – A Pytorch DataLoader with training samples. If the model has a predefined train_dataloader method this will be skipped.
val_dataloaders ( Union [DataLoader] ,List[DataLoader], None) – Either a single Pytorch Dataloader or a list of them, specifying validation samples. If the model has a predefined val_dataloaders method this will be skipped

Other points

.test( ) will not run unless called directly. trainer.test()
.test() will automatically load the optimal model.
model.eval() and torch.no_grad() is automatically called when testing.
By default, Trainer() runs on the CPU.

Example of use

1. Manually add command line parameters:

  from argparse import ArgumentParserdef main(hparams): model = LightningModule() trainer = Trainer(gpus=hparams.gpus) trainer.fit (model)if __name__ == '__main__': parser = ArgumentParser() parser.add_argument('--gpus', default=None) args = parser.parse_args() main(args)

2. Automatically add all Trainer will use command line parameters:

  from argparse import ArgumentParserdef main(args): model = LightningModule() trainer = Trainer.from_argparse_args(args) trainer.fit(model)if __name__ == '__main__': parser = ArgumentParser() parser = Trainer.add_argparse_args( # group the Trainer arguments together parser.add_argument_group(title="pl.Trainer args") ) args = parser.parse_args() main(args)

3. Hybrid,Use both Trainer related parameters, and use some custom parameters, such as various model hyperparameters:

  from argparse import ArgumentParser import pytorch_lightning as plfrom pytorch_lightning import LightningModule, Trainerdef main(args): model = LightningModule() trainer = Trainer.from_argparse_args(args) trainer.fit(model)if __name__ == '__main__': parser = ArgumentParser() parser.add_argument('--batch_size', default=32, type=int) parser. add_argument('--hidden_dim', type=int, default=128) parser = Trainer.add_argparse_args( # group the Trainer arguments together parser.add_argument_group(title="pl.Trainer args") ) args = parser.parse_args() main (args)

all parameters

Trainer. __init__ ( logg er = True , checkpoint_callback = True , callbacks = None , default_root_dir = None , gradient_clip_val = 0 , process_position = 0 , num_nodes = 1 , num_processes = 1 , gpus = None , auto_select_gpus = False , tpu_cores=None , log_gpu_memory=None , progress_bar_refresh_rate = None , overfit_batches = 0.0 , track_grad_norm = - 1 , check_val_every_n_epoch = 1 , fast_dev_run = False , accumulate_grad_batches = 1 , max_epochs = None , min_epochs = None , max_steps = None , min_steps = None , limit_train_batches=1.0 , lim it_val_batches = 1.0 , limit_test_batches = 1.0 , limit_predict_batches = 1.0 , val_check_interval = 1.0 , flush_logs_every_n_steps = 100 , log_every_n_steps = 50 , accelerator = None , sync_batchnorm = False , precision = 32 , weights_summary = 'top' , weights_save_path=None , num_sanity_val_ steps = 2 , truncated_bptt_steps = None , resume_from_checkpoint = None , profiler = None , benchmark = False , deterministic = False , reload_dataloaders_every_epoch = False , auto_lr_find = False , replace_sampler_ddp = True , terminate_on_nan = False , auto_scale_batch_size=False , prepare_ data_per_node = True , plugins = None , amp_backend = 'native' , amp_level = 'O2' , distributed_backend = None , move_metrics_to_cpu = False , multiple_trainloader_mode = 'max_size_cycle' , stochastic_weight_avg = False )

Log and the return loss in the end do

To add a training loop use the training_step method.

  class LitClassifier(pl.LightningModule): def __init__(self, model): super().__init__() sel f.model = model def training_step(self, batch, batch_idx): x, y = batch y_hat = self.model(x) loss = F.cross_entropy(y_hat, y) return loss

either training_step ,Still validation_step , test_step return values are all loss . The returned loss will be collected in a list.

Under the hood, Lightning does the following (pseudocode):

  # put model in train modemodel.train()torch.set_grad_enabled(True)losses = []for batch in train_dataloader: # forward loss = training_step (batch) losses.append(loss.detach()) # backward loss.backward() # apply and clear grads optimizer.step() optimizer.zero_grad()

Training epoch-level metrics

If you want to calculate epoch-level metrics and log them, use the .log method.

  def training_step(self, batch, batch_idx): x, y = batch y_hat = self.model(x) loss = F.cross_entropy(y_hat, y) # logs metrics for each training_step, # and the average across the epoch, to the progress bar and logger self.log('train_loss', loss, on_step=True, on_epoch=True, prog_bar=True , logger=True) return loss

If the .log() function is used in theThis amount will then be recorded gradually. Each log outgoing variables will be recorded, each step will generate a dictionary dict, and each epoch will collect these dictionaries to form a dictionary of list.

The .log object automatically reduces the requested metrics across the full epoch. Here's the pseudocode of what it does under the hood:

  outs = []for batch in train_dataloader: # forward out = training_step(val_batch) # backward loss.backward() # apply and clear grads optimizer.step() optimizer.zero_grad()epoch_metric = torch.mean(torch.stack([x['train_loss'] for x in outs]))

Train epoch-level operations

If you need to do something with all the outputs of each training_step, override training_epoch_end yourself.

  def training_step(self, batch, batch_idx): x, y = batch y_hat = self.model( x) loss = F.cross_entropy(y_hat, y) preds = ... return {'loss': loss, 'other_stuff': preds}def training_epoch_end(self, training_step_outputs): for pred in training_step_outputs: # do something _co de70code

   The matching pseudocode is:  
   outs = []for batch in train_dataloader: # forward out = training_step(val_batch) # backward loss.backward() # apply and clear grads optimizer.step() optimizer.zero_grad ()training_epoch_end(outs)  
  DataModule 
   Homepage: https://pytorch-lightning.readthedocs.io/en/latest/extensions/datamodules.html  
  intro
   First,This    DataModule    does not conflict with the previously written Dataset at all. The former is a wrapper for the latter, and this wrapper can be used in multiple torch Datasets. In my opinion, its biggest role is to simply reuse repetitive codes such as various train/val/test divisions and DataLoader initialization through wrapper classes.  
   specific role of the project:  
    Download instructions: Download  
   Processing instructions: processing  
   Split instructions: split  
   Train dataloader: training set Dataloader  
   Val dataloader(s): Validation set Dataloader  
   Test dataloader(s): Test set Dataloader  
 
  Enhanced functions include:  
    prepare_data(self)   :  
    Operations (tokenize), etc.  
   Here is the function to prepare the data once and for all.  
   Since it is only called in a single thread, do not perform assignment operations like    self.x=y    in this function.  
   But if you use it yourself instead of distributing it to the public, this function may not need to be called, because the data is processed in advance.  
 
    setup(self, stage=None)    ：  
    Instantiate the dataset (Dataset), and perform related operations, such as: counting the number of classes, dividing the train/val/test set, etc. .  
   parameter    stage    is used to indicate whether it is in training period (    fit    ) or test period (  _  span1 _span),Among them, the    fit    cycle requires the construction of both train and val datasets.  
  The  setup function does not require a return value. The initialized train/val/test set can be directly assigned to self.  
 
    train_dataloader/val_dataloader/test_dataloader   :  
    initialization    DataLoader     
   returns a DataLoader volume.  
 
  Example 
   class MNISTDataModule(pl.LightningDataModule): def __init__(self, data_dir: str = './', batch_size: int = 64, num_workers: int = 8): super().__init__( ) self.data_dir = data_dir self.batch_size = batch_size self.num_workers = num_workers self.transform = transforms.Compose([ transforms.ToTensor(), transforms.Normalize((0.1307,), (0.3081,)) ]) # self. dims is returned when you call dm.size() # Setting default dims here because we know them. # Could optionally be assigned dynamically in dm.setup() self.dims = (1, 28, 28) self.num_classes = 10 def prepare_data(self): # download MNIST(self.data_dir, train=True, download=True) MNIST(self.data_dir, train=False, download=True) def setup(self, stage=None): # Assign train/val datasets for use in dataloaders if stage == 'fit' or stage is None: mnist_full = MNIST(self.data_dir, train=True, transform=self.transform) self.mnist_train, self.mnist_val = random_split(mnist_full, [55000, 5000]) # Assign test dataset for use in dataloader(s) if stage == 'test' or stage is None: self.mnist_test = MNIST(self.data_dir, train=False, transform=self.transform) def train_dataloader(self): return DataLoader(self.mnist_train, batch_size=self.batch_size, num_workers=self.num_workers) def val_dataloader(self): return DataLoader(self.mnist_val, batch_size=self.batch_size, num_workers=self.num_workers) def test_dataloader(self): return DataLoader(self.mnist_test, batch_size=self.batch_size, num_workers=self.num_workers)  
  gist 
   If a    self.dims    variable is defined in the DataModule,You can get this variable later by calling    dm.size()   .  
  Saving and Loading 
   Homepage: https://pytorch-lightning.readthedocs.io/en/latest/common/weights_loading.html  
  Saving 
   ModelCheckpoint Address: https://pytorch-lightning. readthedocs.io/en/latest/extensions/generated/pytorch_lightning.callbacks.ModelCheckpoint.html%23pytorch_lightning.callbacks.ModelCheckpoint  
   ModelCheckpoint: Automatically stored callback module. By default, only the latest model and related parameters are automatically stored during the training process, and users can customize it through this module. For example, observe the amount of a    val_loss   , and store the top 3 models, and also store the models of the last epoch, and so on. Example:  
   from pytorch_lightning.callbacks import ModelCheckpoint# saves a file like: my/path/sample-mnist-epoch=02-val_loss=0.32.ckptcheckpoint_callback = ModelCheckpoint( monitor='val_loss', filename='sample-mnist -{epoch:02d}-{val_loss:.2f}', save_top_k=3, mode='min', save_last=True)trainer = pl.Trainer(gpus=1, max_epochs=3, progress_bar_refresh_rate=20, callbacks=[ checkpoint_callback])  
    Also,You can also store checkpoint manually:    trainer.save_checkpoint("example.ckpt")   
    ModelCheckpoint    Callback, if    save_weights_only =True  (equivalent to    model.save_weights(filepath)    ), otherwise it will save the entire model (equivalent to    model.save(filepath)    ).  
 
  Loading 
   Load a model, including its weights, biases and hyperparameters:  
   model = MyLightingModule.load_from_checkpoint(PATH)print(model.learning_rate)# prints the learning_rate you used in this checkpointmodel. eval()y_hat = model(x)  
   Replace some hyperparameters when loading the model:  
   class LitModel(LightningModule): def __init__(self, in_dim, out_dim): super().__init__() self. save_hyperparameters() self.l1 = nn.Linear(self.hparams.in_dim, self.hparams.out_dim)# if you train and save the model like this it will use these values when loading# the weights. But you can overwrite thisLitModel( in_dim=32, out_dim=10)# uses in_dim=32, out_dim=10model = LitModel.load_from_checkpoint(PATH)# uses in_dim=128, out_dim=10model = LitModel.load_from_checkpoint(PATH, in_dim=128, out_dim=10)  
   full load training state: load includes everything about the model,And all parameters related to training, such as    model, epoch, step, LR schedulers, apex   , etc.  
   model = LitModel() trainer = Trainer(resume_from_checkpoint='some/path/to/ my_checkpoint.ckpt')# automatically restores model, epoch, step, LR schedulers, apex, etc...trainer.fit(model)  
  Callbacks 
   Callback is a self-contained program that can be interleaved with the training process together, without contaminating the main research logic.  
   Callback is not only called at the end of epoch. pytorch-lightning provides dozens of hooks (interfaces, call locations) to choose from, and you can also customize callbacks to implement any module you want to implement.  
   The recommended way to use is to write these functions into the lightning module for operations that change with problems and projects, and for relatively independent, relatively auxiliary, content that needs to be reused, you can define a separate module for subsequent convenience. Plug and unplug use.  
  Callbacks Recommended 
   Built-in Callbacks: https://pytorch-lightning.readthedocs.io/en/latest/extensions/callbacks.html%23built-in-callbacks  
    EarlyStopping(monitor=' early_stop_on', min_delta=0.0, patience=3, verbose=False, mode='min', strict=True)    : According to a certain value,Stop training early if there is no improvement for several epochs.  
    Parameters: 
    monitor    (str) – quantity to be monitored. Default: 'early_stop_on'. as an improvement, ie an absolute change of less than min_delta, will count as no improvement. Default: 0.0. 
    patience    (int) – number of validation epochs with no improvement after which training will be stopped. Default : 3. 
    verbose    (bool) – verbosity mode. Default: False. 
    mode    (str) – one of 'min', 'max'. In 'min' mode, training will stop when the q uantity monitored has stopped decreasing and in 'max' mode it will stop when the quantity monitored has stopped increasing. 
    strict    (bool) – whether to crash the training if monitor is not found in the validation metrics. Default : True  
 
   example:.  
   from pytorch_lightning import Trainerfrom pytorch_lightning.callbacks import EarlyStoppingearly_stopping = EarlyStopping ( 'val_loss') trainer = Trainer (callbacks = [early_stopping])  
    ModelCheckpoint   : See above    Saving and Loading    .    PrintTableMetricsCallback    : Print a result collation table after each epoch. 
   from pl_bolts.callbacks import PrintTableMetricsCallbackcallback = PrintTableMetricsCallback() trainer = pl.Trainer(callbacks=[callback])trainer.fit(...)# --------------- ---------------# at the end of every epoch it will print# ------------------------ ------# loss│train_loss│val_loss│epoch# ────────────────────────────────# 2.2541470527648926│2.2541470527648926│ 2.2158432006835938│0  
  Logging 
   Logging: Logger is TensorBoard by default, but various mainstream Logger frameworks can be specified, such as Comet.ml, MLflow, Netpune, or direct CSV files. Multiple loggers can be used at the same time.  
   from pytorch_lightning import loggers as pl_loggers# Defaulttb_logger = pl_loggers.TensorBoardLogger( save_dir=os.getcwd(), version=None, name='lightning_logs')trainer = Trainer(logger=tb_logger)# Or use the same format as otherstb_logger = pl_loggers.TensorBoardLogger('logs/')# One Loggercomet_logger = pl_loggers.CometLogger(save_dir='logs/')trainer = Trainer(logger=comet_logger)# Save code snapshotlogger = pl_loggers.TestTubeLogger('logs/', create_git_tag= True)# Multiple Loggertb_logger = pl_loggers.TensorBoardLogger('logs/')comet_logger = pl_loggers.CometLogger(save_dir='logs/')trainer = Trainer(logger=[tb_logger, comet_logger])  
   By default,Once every 50 batch logs, the parameters can be adjusted.  
   If you want to log and output non-scalar (scalar) content, such as pictures, text, histograms, etc., you can directly call    self.logger.experiment.add_xxx()    to achieve the desired operate.  
   def training_step(...): ... # the logger you used (in this case tensorboard) tensorboard = self.logger.experiment tensorboard.add_image() tensorboard.add_histogram(...) tensorboard.add_figure( ...)  
   use log: if TensorBoard, then:    tensorboard --logdir ./lightning_logs    . In Jupyter Notebook, you can use:  
   # Start tensorboard.%load_ext tensorboard%tensorboard --logdir lightning_logs/  
   to open TensorBoard inline. 
    Tip: If TensorBoard is enabled in the LAN, add flag    --bind_all    to access by hostname:  
 
   tensorboard --logdir lightning_logs > `http://SERVER-NAME:6006/  
  Transfer Learning 
   Homepage: https://pytorch-lightning.readthedocs.io/en/latest/starter/introduction_guide.html%23transfer-learning  
   import torchvision.models as modelsclass ImagenetTransferLearning(LightningModule): def __init__(self): super().__init__() # init a pretrained resnet backbone = models.resnet50(pretrained=True) num_filters = backbone.fc.in_features layers = list (backbone.children())[:-1] self.feature_extractor = nn.Sequential(*layers) # use the pretrained model to classify cifar-10 (10 image classes) num_target_classes = 10 self.class ifier = nn.Linear(num_filters, num_target_classes) def forward(self, x): self.feature_extractor.eval() with torch.no_grad(): representations = self.feature_extractor(x).flatten(1) x = self.classifier (representations) ...  
  About device operation 
   LightningModules know what device they are on! Construct tensors on the device directly to avoid CPU->Device transfer.  
   # badt = torch.rand(2 , 2).cuda()# good (self is LightningModule)t = torch.rand(2, 2, device=self.device)  
   For tensors that need to be model attributes, it is best practice to register them as buffers in the modules'    __init__    method:  
   # badself.t = torch.rand(2, 2, device=self.device)# goodself.register_buffer("t", torch.rand (2, 2))  
  span1 The first two paragraphs of the span are the text in the tutorial.However, there is actually a dark pit:  
   If you use a relayed    pl.LightningModule   , and this module instantiates a common    nn.Module   , and this model needs to generate some tensors internally, such as the mean, std, etc. of each channel of the picture, then if you pass a    self.device from    pl.LightningModule       , actually at the beginning this    self.device    is always    cpu    . So if you initialize in the call to    nn.Module     __init__()    ,Use    to(device)    or nothing at all, the result is that it is always on    cpu    .  
   However, after the experiment, although    pl.LightningModule    in    __init __ ()    stage    self.device    or    cpu   , when entering    training_step()   , it quickly becomes    cuda    . So, for submodules, the best solution is to use a    forward    The amount passed in,For example,    x    , as a reference variable, use    type_as    function to place all the tensors generated in the model on the same device as this reference variable.  
   class RDNFuse(nn.Module): ... def init_norm_func(self, ref): self.mean = torch.tensor(np.array(self.mean_sen), dtype=torch.float32).type_as(ref ) def forward(self, x): if not hasattr(self, 'mean'): self.init_norm_func(x)  
  Points 
    pl.seed_everything(1234)    : for all relevant random quantities Fixed seeds.  
   When using LR Scheduler, you don't need to do it yourself    .step()    . It is also handled automatically by Trainer. 
   related interface: https://pytorch-lightning.readthedocs.io/en/latest/common/optimizers.html%3Fhighlight%3Dscheduler%23  
   # Single optimizerfor epoch in epochs: for batch in data: loss = model.training_step(batch, batch_idx, ...) loss.backward() optimizer.step() optimizer.zero_grad() for scheduler in schedulers: scheduler.step() # Multiple optimizers for epoch in epochs: for batch in data : for opt in optimizers: disable_grads_for_other_optimizers() train_step(opt) opt.step() for scheduler in schedulers: scheduler.step()  
   The method for dividing train and val sets. Not related to PL, but very commonly used, two examples:    random_split(range(10), [3, 7], generator=torch.Generator().manual_seed(42))   
   as follows:  
   from torch.utils.data import DataLoader, random_splitfrom torchvision.datasets import MNISTmnist_full = MNIST(self.data_dir, train=True, transform=self.transform)self.mnist_train, self.mnist_val = random_split(mnist_full, [55000, 5000 ])  
   Parameters: 
 dataset (https://pytorch.org/docs/stable/data.html%23torch.utils.data.Dataset) – Dataset to be split 
 lengths – lengths of splits to be produced 
 generator (https://pytorch.org/docs/stable/generated/torch.Generator.html%23torch.Generator) – Generator used for the random permutation. 
  .



                    
                        
                            hotcomm
                        
                    

                                            
                            "Pytorch Lightning Complete Walkthrough" Related video
                            
                                                                    
                                        
                                            
                                            
                                            Deep Learning With PyTorch - Full Course...
                                        
                                                                                    4:35:42
                                                                            
                                                                    
                                        
                                            
                                            
                                            Episode 3: From PyTorch to PyTorch Lightning...
                                        
                                                                                    37:13
                                                                            
                                                                    
                                        
                                            
                                            
                                            PyTorch Lightning Tutorial - Lightweight PyTorch W...
                                        
                                                                                    28:02
                                                                            
                                                                    
                                        
                                            
                                            
                                            Introduction to Coding Neural Networks with PyTorc...
                                        
                                                                                    20:43
                                                                            
                                                                    
                                        
                                            
                                            
                                            Episode 1: Training a classification model on MNIS...
                                        
                                                                                    48:14
                                                                            
                                                                    
                                        
                                            
                                            
                                            PyTorch in 100 Seconds...
                                        
                                                                                    2:43
                                                                            
                                                                    
                                        
                                            
                                            
                                            PyTorch or Tensorflow? Which Should YOU Learn!...
                                        
                                                                                    0:36
                                                                            
                                                                    
                                        
                                            
                                            
                                            Episode 4: Implementing a PyTorch Trainer: PyTorch...
                                        
                                                                                    29:14
                                                                            
                                                                    
                                        
                                            
                                            
                                            PyTorch Lightning #1 - Why Lightning?...
                                        
                                                                                    7:36
                                                                            
                                                                    
                                        
                                            
                                            
                                            Learn PyTorch for deep learning in a day. Literall...
                                        
                                                                                    25:36:58
                                                                            
                                                            
                        
                    
                    
                        hotcomm Category Latest News
                                                    
    
        
            
            
        
        
        hotcomm
    

    She is a veteran Taiwanese actress who has been filming for more than 40 years. Not only is she very exquisite in the actors, but she also has a wide range of acting careers. She can be funny or sad, and is very popular with everyone.
    
        The veteran actor revealed that he had been sexually assaulted by drugs, and his life after marriage was like a prison. He only had the courage to divorce at the age of 70.
        
            06/15
            1057
        
        
    
    


                                                    
    
        
            
            
        
        
        hotcomm
    

    I believe everyone has seen news about some female celebrities being drugged and sexually assaulted in the entertainment industry. For example, a few years ago, Li Zongrui, a Taiwanese erotic young man, drugged and sexually assaulted many female stars, which made many netizens ve
    
        The actress revealed that she had been sexually assaulted by drugs, and her life after marriage was like a prison. She only had the courage to divorce at the age of 70.
        
            06/15
            1558
        
        
    
    


                                                    
    
        
            
            
        
        
        hotcomm
    

    ★Hangzhou’s “Top Ten” Women’s Rights Protection Volunteer Style Exhibition★In the Hangzhou Women’s Rights Protection and Legal Aid Volunteer Group, there is a volunteer who is the guardian of many victims of domestic violence.
    
        She has protected the victims of domestic violence in Hangzhou for 10 years and has been ranked first in the country! Anti-domestic violence, they are the most awesome
        
            06/15
            1306
        
        
    
    


                                                    
    
        
            
            
        
        
        hotcomm
    

    In the past two years, brand developers have gathered in Wuhou. Since 2016, around the Third Ring Road and even outside the Third Ring Road have become popular. Longfor, China Railway Construction, Poly, Jinmao and others first acquired land and actively supported it.
    
        My Dawuhou is amazing. Just looking at the land reserves, you can know how popular this area will be in the future.
        
            06/15
            1268
        
        
    
    


                                                    
    
        
            
            
        
        
        hotcomm
    

    Gyro Pool Table Royal Caribbean Ocean Splendor Cruise The Royal Caribbean Ocean Splendor Cruise perfectly solves the problem of how to continue the billiards game on the rough sea.
    
        How many of the world's top 10 coolest travel ideas have you seen?
        
            06/15
            1269
        
        
    
    


                                                    
    
        
            
            
        
        
        hotcomm
    

    The life of modern people is like a pendulum, constantly swinging left and right between work and family. Finally, I can find a gap that can break away from frequent and regular life and allow the body, mind and soul to breathe and rest. Designer Huang Shuheng hopes to open a gap
    
        Through the past and the present, recreate the queen's elegant temperament | Taiwan Central Park Club Design and Design Union · Issue 1416
        
            06/15
            1560
        
        
    
    


                                                    
    
        
            
            
        
        
        hotcomm
    

    On November 18, 2007, the governments of China and Singapore signed a framework agreement, and the Eco-City project was officially located in Tianjin. From the initial abandoned salt fields to the current green, ecologically livable smart city, the permanent population has reache
    
        "Flowers of Miracle" bloom in "Pickle Jar": A Smart Guide to an Ecological City
        
            06/15
            1719
        
        
    
    


                                                    
    
        
            
            
        
        
        hotcomm
    

    Poetry and poets are often associated with all kinds of beautiful things and imagination, which always makes people feel a little far from ordinary life. "On the morning of July 11, the "Tang Poetry" competition site of the "Tang Poetry" series of activities of the county educati
    
        An Shaobang, the "little poet" of Liquan Primary School: Interest is the best teacher
        
            06/15
            1496
        
        
    
    


                                                    
    
        
            
            
        
        
        hotcomm
    

    In view of this, based on a brief review of the relevant situations of international and domestic industrial heritage protection and utilization, we pointed out that the protection and utilization of industrial heritage can be one of the future transformation and upgrading direct
    
        "Original" domestic and foreign experience in financial support for the protection and utilization of industrial heritage
        
            06/15
            1595
        
        
    
    


                                                    
    
        
            
            
        
        
        hotcomm
    

    Zhang Fei said: "I don't know if there is a next season when I have recorded the videos for two more times. This is strange, but whether the show is still done or not will not affect my attendance at Kim Jong, because these are two different things."
    
        Zhang Fei was shortlisted for the Golden Bell Awards, and the ratings of "Variety Show Fei Changpin" are low and facing the embarrassment of not being able to continue.
        
            06/15
            1264
        
        
    
    


                                            

                    

                    
    
        
            Recommended
            
                
                                            
                            On January 2, according to Taiwanese media reports, Golden Horse Film Queen Chen Shufang accepted an exclusive interview with a program and talked about her experience since she was in the film industry. In the exposed photos, the veteran actor Chen Shufang wore a pink plaid jack
                        
                                            
                            The kindergarten carries out the "flea market" activity to make children willing to share and exchange their old items with their peers, know how to cherish items, cherish resources, and reuse waste items.
                        
                                            
                            In September, the sun is still vicious, the autumn tiger is arrogant, the construction site rumbles continuously, and the project is in full swing. Luo Yi finished his day's work after moving the last pack of cement. He had already greeted the foreman Liu. He was going home early
                        
                                            
                            Sensing paper is widely used in daily life, such as ATM cash machine details, credit card signing receipts, waiting number orders, lottery, express orders, takeaway orders…
                        
                                            
                            With the advancement of all-electric invoices (click here to review: 14 provinces and cities across the country have made it clear! All-electric invoices are here and will come into effect on July 18), the Taxation Bureau is paying more and more attention to the current invoices!
                        
                                            
                            Our court has served the execution notice, the height limit order, the property reporting order and relevant legal documents to the person subject to execution Kong Lingzhuo in accordance with the law, ordering the person subject to execution to fulfill the obligations determined
                        
                                            
                            Article 205 of my country's Criminal Law stipulates the crime of issuing a special value-added tax invoice. Judging from the law and relevant judicial interpretations, there are three sentencing levels: 1. The amount of tax is "small", and the sentence is sentenced to fixed-term 
                        
                                            
                            After participating in the children's collective activities and game activities, parents have a further understanding of kindergarten work. At the same time, it can also encourage parents to participate more actively in various kindergarten work, jointly complete the education an
                        
                                            
                            While the list of universities tends to be among the same few expensive Ivy League schools, the United States has more than 4,000 degree-granting schools, which also provides high-quality educational services to every student who goes to study. Alaska is the largest, westernmost,
                        
                                            
                            Summary of cases of relatively unprosecution of cases of falsely issuing value-added tax invoices in Nanjing Jiangning He Guanshu: Defence lawyer for falsely issuing value-added tax invoices, and defense lawyer for tax crimes 1.1. Case 1: Wei Moumou’s case of falsely issuing valu
                        
                                    
            
            
        

        
            Hot
            
                
                                            
                            1
                            After participating in the children's collective activities and game activities, parents have a further understanding of kindergarten work. At the same time, it can also encourage parents to participate more actively in various kindergarten work, jointly complete the education an
                        
                                            
                            2
                            After participating in the children's collective activities and game activities, parents have a further understanding of kindergarten work. Activity location on the morning of X-X, 201X: Purpose of XX Kindergarten Activity 1. Enhance communication between teachers and parents, pu
                        
                                            
                            3
                            Speaking of English enlightenment, both teachers and parents attach great importance to it. Many parents even start to teach their children English when they can’t wait to teach them…
                        
                                            
                            4
                            CHAPTER 3 Chapter 3 Commencement of laytime CHAPTER 3 Chapter 3 Commencement of laytime CHAPTER 3 Chapter 3 Commencement of laytime CHAPTER 3.243 In the ﬁrst case, the owners tende
                        
                                            
                            5
                            Recently, in the backstage of Zhengguan News Xintongqiao column, netizen Ms. Yang reported that her daughter was in the outpatient clinic of Zhengdong Campus of Henan Children's Hospital, and the insurance claim required an invoice, but was rejected by the hospital when issuing t
                        
                                            
                            6
                            The kindergarten carries out the "flea market" activity to make children willing to share and exchange their old items with their peers, know how to cherish items, cherish resources, and reuse waste items.
                        
                                            
                            7
                            Tonight (October 31) is Halloween, and tomorrow (November 1) is Hallowmas. All Saints' Day is also called All Saints' Day. It is one of the most popular and popular festivals in the Western world.
                        
                                            
                            8
                            At 10:48 on September 18, a wedding convoy consisting of three red BMW sports cars and 15 red Mazdas drove into Xiaoheya Village, Baicheng Town, Gaomi City, Weifang, Shandong Province, causing a boiling momentum in the village! Whose children get married? He actually made such a 
                        
                                            
                            9
                            On January 2, according to Taiwanese media reports, Golden Horse Film Queen Chen Shufang accepted an exclusive interview with a program and talked about her experience since she was in the film industry. In the exposed photos, the veteran actor Chen Shufang wore a pink plaid jack
                        
                                            
                            10
                            Halloween is a mysterious day. Kindergarten Halloween games can be played in competitions, such as obstacle running, pumpkin fishing, clay sculpture painting, etc. Environmental design is very important. When teachers wear mysterious and beautiful clothes, the environment should 
                        
                                    
            
            
        
    
    

                    
                        hotcomm video recommendation
                        
                                                            
                                    
                                        
                                        
                                        USU and China trade spotlight...
                                    
                                                                            55:01
                                                                    
                                                            
                                    
                                        
                                        
                                        Underground market-demo video...
                                    
                                                                            1:54
                                                                    
                                                            
                                    
                                        
                                        
                                        What is the Company behind the WLGS Stock Ticker? ...
                                    
                                                                            9:30
                                                                    
                                                            
                                    
                                        
                                        
                                        Welcome to our production workshop！...
                                    
                                                                            0:24
                                                                    
                                                            
                                    
                                        
                                        
                                        China's 'uneven' recovery will show in market perf...
                                    
                                                                            1:52
                                                                    
                                                            
                                    
                                        
                                        
                                        What lead to Chinese stock selloff?...
                                    
                                                                            2:43
                                                                    
                                                            
                                    
                                        
                                        
                                        Meet the CEO: Wang Ming Chieh Explains the Vision ...
                                    
                                                                            1:31
                                                                    
                                                            
                                    
                                        
                                        
                                        China is a short to medium-term trade, says Highto...
                                    
                                                                            3:59
                                                                    
                                                            
                                    
                                        
                                        
                                        Stocks Rally fro the Week on Fed Rate-Cut Optimism...
                                    
                                                                            1:31:49
                                                                    
                                                            
                                    
                                        
                                        
                                        Biden Slams Trump For Wanting Stock Market To 'Cra...
                                    
                                                                            0:48



            
    
        
            Too much news to read? Try searching!


        

        
    
        
            Copyright ©
                DayDayNews