这不是PyTorch入门教程,而是入门教程的学后思考~
Why PyTorch?
使用PyTorch有两个主要原因:
- 作为NumPy的替换物。支持GPU计算;
- 深度学习库(掉包侠的最爱)。支持动态图,秒杀静态图TensorFlow~
Tensor
说了PyTorch可以作为NumPy的替代物,那么Tensor就是用来替代NumPy中的ndarry的~Tensor可以很方便地支持GPU计算,炼丹速度+++。
属性
现在假设有一个Tensor类型的变量x,常见的需要注意的属性如下:
x.requires_grad:决定了在backpropagate的时候,是否需要为x计算梯度。通常我们网络的参数需要设置为True,不是网络的参数一般设置为False;x.grad:x的梯度大小。通常在每次训练的时候都要将上一轮训练得到的x.grad清零,否则会得到叠加之后的x.grad;
依我之见,在用PyTorch构造深度学习模型的时候,我们用到的Tensor类型的变量主要有两种:
- 我们自己定义的。比如输入的特征矩阵
x; - 我们的网络定义的。比如全连接网络的参数
w和b;
这两种Tensor变量的区别在于:
- 我们定义的Tensor,它的
.requires_grad一般都是False,是不会通过backward()计算梯度的; - 而网络里面定义的,它的
.requires_grad一般都是True,就会通过backward()计算梯度的。
方法
现在假设有一个Tensor类型的变量x,它的函数有:
x.item():If you have a one element tensor, use.item()to get the value as a Python number;x.numpy():返回一个ndarry,其维度和值与Tensor一模一样。同时注意它们是共享内存的,一个变量值改变,另一个变量值也会跟着改变。(类似于指针);x.to(deviceName):Tensors can be moved onto any device using the.to()method;x.backward():以x为起点,求涉及到x计算的所有x.requires_grad为True的Tensor的梯度;一般x.backward()的要求是x是一个标量Tensor,否则需要按照下面的操作进行:If Tensor is a scalar (i.e. it holds a one element data), you don’t need to specify any arguments to
backward(), however if it has more elements, you need to specify agradientargument that is a tensor of matching shape.
Function
PyTorch官方关于Function的描述:
Tensor and
Functionare interconnected and build up an acyclic graph, that encodes a complete history of computation. Each tensor has a.grad_fnattribute that references aFunctionthat has created the Tensor (except for Tensors created by the user - theirgrad_fn is None).
上面的意思是说,每个Tensor都是通过一个Function产生的,Tensor会记录产生自己的Function是什么。
神经网络搭建步骤
- Define the neural network that has some learnable parameters (or weights)
- Iterate over a dataset of inputs
- Process input through the network
- Compute the loss (how far is the output from being correct)
- Propagate gradients back into the network’s parameters
- Update the weights of the network, typically using a simple update rule:
weight = weight - learning_rate * gradient
代码技巧
在测试阶段使用
with torch.no_grad()可以对整个网络都停止自动求导,可以大大加快速度,也可以使用大的batch_size来测试。当然,也可以不使用with torch.no_grad;model.train() model.eval() with torch.no_grad()在训练时,使用
optim.zero_grad(),清空上一步的grad;
总结
torch.Tensor- A multi-dimensional array with support for autograd operations likebackward(). Also holds the gradient w.r.t. the tensor.nn.Module- Neural network module. Convenient way of encapsulating parameters, with helpers for moving them to GPU, exporting, loading, etc.nn.Parameter- A kind of Tensor, that is automatically registered as a parameter when assigned as an attribute to aModule.autograd.Function- Implements forward and backward definitions of an autograd operation. Every Tensor operation creates at least a singleFunctionnode that connects to functions that created a Tensor and encodes its history.