Deep Learning with PyTorch: A 60 Minute Blitz

PyTorch

发布日期: 2020-11-30

这不是PyTorch入门教程，而是入门教程的学后思考~

Why PyTorch？

使用PyTorch有两个主要原因：

作为NumPy的替换物。支持GPU计算；
深度学习库（掉包侠的最爱）。支持动态图，秒杀静态图TensorFlow~

Tensor

说了PyTorch可以作为NumPy的替代物，那么Tensor就是用来替代NumPy中的ndarry的~Tensor可以很方便地支持GPU计算，炼丹速度+++。

属性

现在假设有一个Tensor类型的变量x，常见的需要注意的属性如下：

x.requires_grad：决定了在backpropagate的时候，是否需要为x计算梯度。通常我们网络的参数需要设置为True，不是网络的参数一般设置为False；
x.grad：x的梯度大小。通常在每次训练的时候都要将上一轮训练得到的x.grad清零，否则会得到叠加之后的x.grad；

依我之见，在用PyTorch构造深度学习模型的时候，我们用到的Tensor类型的变量主要有两种：

我们自己定义的。比如输入的特征矩阵x；
我们的网络定义的。比如全连接网络的参数w和b；

这两种Tensor变量的区别在于：

我们定义的Tensor，它的.requires_grad一般都是False，是不会通过backward()计算梯度的；
而网络里面定义的，它的.requires_grad一般都是True，就会通过backward()计算梯度的。

方法

现在假设有一个Tensor类型的变量x，它的函数有：

x.item()：If you have a one element tensor, use .item() to get the value as a Python number；
x.numpy()：返回一个ndarry，其维度和值与Tensor一模一样。同时注意它们是共享内存的，一个变量值改变，另一个变量值也会跟着改变。（类似于指针）；
x.to(deviceName)：Tensors can be moved onto any device using the .to() method；
x.backward()：以x为起点，求涉及到x计算的所有x.requires_grad为True的Tensor的梯度；一般x.backward()的要求是x是一个标量Tensor，否则需要按照下面的操作进行：

If Tensor is a scalar (i.e. it holds a one element data), you don’t need to specify any arguments to backward(), however if it has more elements, you need to specify a gradient argument that is a tensor of matching shape.

Function

PyTorch官方关于Function的描述：

Tensor and Function are interconnected and build up an acyclic graph, that encodes a complete history of computation. Each tensor has a .grad_fn attribute that references a Function that has created the Tensor (except for Tensors created by the user - their grad_fn is None).

上面的意思是说，每个Tensor都是通过一个Function产生的，Tensor会记录产生自己的Function是什么。

神经网络搭建步骤

Define the neural network that has some learnable parameters (or weights)
Iterate over a dataset of inputs
Process input through the network
Compute the loss (how far is the output from being correct)
Propagate gradients back into the network’s parameters
Update the weights of the network, typically using a simple update rule: weight = weight - learning_rate * gradient

代码技巧

在测试阶段使用with torch.no_grad()可以对整个网络都停止自动求导，可以大大加快速度，也可以使用大的batch_size来测试。当然，也可以不使用with torch.no_grad；model.train() model.eval() with torch.no_grad()
在训练时，使用optim.zero_grad()，清空上一步的grad；

总结

torch.Tensor - A multi-dimensional array with support for autograd operations like backward(). Also holds the gradient w.r.t. the tensor.
nn.Module - Neural network module. Convenient way of encapsulating parameters, with helpers for moving them to GPU, exporting, loading, etc.
nn.Parameter - A kind of Tensor, that is automatically registered as a parameter when assigned as an attribute to a Module.
autograd.Function - Implements forward and backward definitions of an autograd operation. Every Tensor operation creates at least a single Function node that connects to functions that created a Tensor and encodes its history.