torch.nn.Module.register_buffer()的使用

发布日期: 2021-08-26

在使用torch.nn.Module类编写深度学习模型的时候，我们通常会在里面定义很多的Parameters（比如nn.Linear等等）。这些Parameters是会随着模型的不断训练而更新的。

但是，如果我们需要定义一些不变的常量该怎么办呢？

例如在论文CasEE的源码中，需要定义一个type_indices的常量：

class TypeCls(nn.Module):
    def __init__(self, config):
        super(TypeCls, self).__init__()
        self.type_emb = nn.Embedding(config.type_num, config.hidden_size)
        self.type_indices = torch.arange(0, config.type_num, 1).long()
        self.dropout = nn.Dropout(config.decoder_dropout)

        self.config = config
        self.Predictor = AdaptiveAdditionPredictor(config.hidden_size, dropout_rate=config.decoder_dropout)

    def forward(self, text_rep, mask):
        type_emb = self.type_emb(self.type_indices)
        pred = self.Predictor(type_emb, text_rep, mask)  # [b, c]
        p_type = torch.sigmoid(pred)
        return p_type, type_emb

第一感觉，可能就是像我上面那样直接声明了。在纯cpu的环境下，这是没有问题的，运行也不会出问题。但是，到了multiple gpus的环境下，运行就会出现下面的报错：

"RuntimeError: Input, output and indices must be on the current device"

原因在于，self.type_indices并没有跟Module中的Parameters一样分配到各个gpu上，它只存在于cpu上，所以报错了。

此时，就需要torch.nn.Module.register_buffer()来帮忙：

class TypeCls(nn.Module):
    def __init__(self, config):
        super(TypeCls, self).__init__()
        self.type_emb = nn.Embedding(config.type_num, config.hidden_size)
        self.register_buffer('type_indices', torch.arange(0, config.type_num, 1).long())
        self.dropout = nn.Dropout(config.decoder_dropout)

        self.config = config
        self.Predictor = AdaptiveAdditionPredictor(config.hidden_size, dropout_rate=config.decoder_dropout)

    def forward(self, text_rep, mask):
        type_emb = self.type_emb(self.type_indices)
        pred = self.Predictor(type_emb, text_rep, mask)  # [b, c]
        p_type = torch.sigmoid(pred)
        return p_type, type_emb

它之所以能够实现将self.type_indices分配到多个gpu上，是因为Module的下面这个函数：

def _replicate_for_data_parallel(self):
    replica = self.__new__(type(self))
    replica.__dict__ = self.__dict__.copy()

    # replicas do not have parameters themselves, the replicas reference the original
    # module.
    replica._parameters = OrderedDict()
    replica._buffers = replica._buffers.copy()
    replica._modules = replica._modules.copy()
    replica._is_replica = True

    return replica

代码replica._buffers = replica._buffers.copy()保证了self.type_indices分配到多个gpu上。

具体关于该方法的使用可以参考：