这不是PyTorch入门教程,而是入门教程的学后思考~
Why PyTorch?
使用PyTorch有两个主要原因:
- 作为NumPy的替换物。支持GPU计算;
- 深度学习库(掉包侠的最爱)。支持动态图,秒杀静态图TensorFlow~
Tensor
说了PyTorch可以作为NumPy的替代物,那么Tensor就是用来替代NumPy中的ndarry的~Tensor可以很方便地支持GPU计算,炼丹速度+++。
属性
现在假设有一个Tensor类型的变量x
,常见的需要注意的属性如下:
x.requires_grad
:决定了在backpropagate
的时候,是否需要为x
计算梯度。通常我们网络的参数需要设置为True
,不是网络的参数一般设置为False
;x.grad
:x
的梯度大小。通常在每次训练的时候都要将上一轮训练得到的x.grad
清零,否则会得到叠加之后的x.grad
;
依我之见,在用PyTorch构造深度学习模型的时候,我们用到的Tensor类型的变量主要有两种:
- 我们自己定义的。比如输入的特征矩阵
x
; - 我们的网络定义的。比如全连接网络的参数
w
和b
;
这两种Tensor变量的区别在于:
- 我们定义的Tensor,它的
.requires_grad
一般都是False
,是不会通过backward()
计算梯度的; - 而网络里面定义的,它的
.requires_grad
一般都是True
,就会通过backward()
计算梯度的。
方法
现在假设有一个Tensor类型的变量x
,它的函数有:
x.item()
:If you have a one element tensor, use.item()
to get the value as a Python number;x.numpy()
:返回一个ndarry
,其维度和值与Tensor一模一样。同时注意它们是共享内存的,一个变量值改变,另一个变量值也会跟着改变。(类似于指针);x.to(deviceName)
:Tensors can be moved onto any device using the.to()
method;x.backward()
:以x
为起点,求涉及到x
计算的所有x.requires_grad
为True
的Tensor的梯度;一般x.backward()
的要求是x
是一个标量Tensor,否则需要按照下面的操作进行:If Tensor is a scalar (i.e. it holds a one element data), you don’t need to specify any arguments to
backward()
, however if it has more elements, you need to specify agradient
argument that is a tensor of matching shape.
Function
PyTorch官方关于Function的描述:
Tensor and
Function
are interconnected and build up an acyclic graph, that encodes a complete history of computation. Each tensor has a.grad_fn
attribute that references aFunction
that has created the Tensor (except for Tensors created by the user - theirgrad_fn is None
).
上面的意思是说,每个Tensor都是通过一个Function
产生的,Tensor会记录产生自己的Function
是什么。
神经网络搭建步骤
- Define the neural network that has some learnable parameters (or weights)
- Iterate over a dataset of inputs
- Process input through the network
- Compute the loss (how far is the output from being correct)
- Propagate gradients back into the network’s parameters
- Update the weights of the network, typically using a simple update rule:
weight = weight - learning_rate * gradient
代码技巧
在测试阶段使用
with torch.no_grad()
可以对整个网络都停止自动求导,可以大大加快速度,也可以使用大的batch_size
来测试。当然,也可以不使用with torch.no_grad;model.train() model.eval() with torch.no_grad()在训练时,使用
optim.zero_grad()
,清空上一步的grad
;
总结
torch.Tensor
- A multi-dimensional array with support for autograd operations likebackward()
. Also holds the gradient w.r.t. the tensor.nn.Module
- Neural network module. Convenient way of encapsulating parameters, with helpers for moving them to GPU, exporting, loading, etc.nn.Parameter
- A kind of Tensor, that is automatically registered as a parameter when assigned as an attribute to aModule
.autograd.Function
- Implements forward and backward definitions of an autograd operation. Every Tensor operation creates at least a singleFunction
node that connects to functions that created a Tensor and encodes its history.