PyTorch教程2.5之自动微分-电子发烧友网

回想一下2.4 节，计算导数是我们将用于训练深度网络的所有优化算法中的关键步骤。虽然计算很简单，但手工计算可能很乏味且容易出错，而且这个问题只会随着我们的模型变得更加复杂而增长。

幸运的是，所有现代深度学习框架都通过提供自动微分（通常简称为 autograd ）来解决我们的工作。当我们通过每个连续的函数传递数据时，该框架会构建一个计算图来跟踪每个值如何依赖于其他值。为了计算导数，自动微分通过应用链式法则通过该图向后工作。以这种方式应用链式法则的计算算法称为反向传播。

虽然 autograd 库在过去十年中成为热门话题，但它们的历史悠久。事实上，对 autograd 的最早引用可以追溯到半个多世纪以前（Wengert，1964 年）。现代反向传播背后的核心思想可以追溯到 1980 年的一篇博士论文 ( Speelpenning, 1980 )，并在 80 年代后期得到进一步发展 ( Griewank, 1989 )。虽然反向传播已成为计算梯度的默认方法，但它并不是唯一的选择。例如，Julia 编程语言采用前向传播（Revels等人，2016 年）. 在探索方法之前，我们先来掌握autograd这个包。

import torch

						from mxnet import autograd, np, npx

npx.set_np()

						from jax import numpy as jnp

						 

						import tensorflow as tf

						 

2.5.1. 一个简单的函数

假设我们有兴趣区分函数 y=2x⊤x关于列向量x. 首先，我们分配x一个初始值。

							x = torch.arange(4.0)
x

							tensor([0., 1., 2., 3.])

						

在我们计算梯度之前y关于 x，我们需要一个地方来存放它。通常，我们避免每次求导时都分配新内存，因为深度学习需要针对相同参数连续计算导数数千或数百万次，并且我们可能会面临内存耗尽的风险。请注意，标量值函数相对于向量的梯度x是向量值的并且具有相同的形状x.

							# Can also create x = torch.arange(4.0, requires_grad=True)
x.requires_grad_(True)
x.grad # The gradient is None by default

							 

							x = np.arange(4.0)
x

							array([0., 1., 2., 3.])

						

Before we calculate the gradient of y with respect to x, we need a place to store it. In general, we avoid allocating new memory every time we take a derivative because deep learning requires successively computing derivatives with respect to the same parameters thousands or millions of times, and we might risk running out of memory. Note that the gradient of a scalar-valued function with respect to a vector x is vector-valued and has the same shape as x.

							# We allocate memory for a tensor's gradient by invoking `attach_grad`
x.attach_grad()
# After we calculate a gradient taken with respect to `x`, we will be able to
# access it via the `grad` attribute, whose values are initialized with 0s
x.grad

							 

							array([0., 0., 0., 0.])

						

							x = jnp.arange(4.0)
x

							No GPU/TPU found, falling back to CPU. (Set TF_CPP_MIN_LOG_LEVEL=0 and rerun for more info.)

						

							Array([0., 1., 2., 3.], dtype=float32)

						

							x = tf.range(4, dtype=tf.float32)
x

							<tf.Tensor: shape=(4,), dtype=float32, numpy=array([0., 1., 2., 3.], dtype=float32)>

						

							x = tf.Variable(x)

							 

我们现在计算我们的函数x并将结果分配给y。

							y = 2 * torch.dot(x, x)
y

							tensor(28., grad_fn=<MulBackward0>)

						

我们现在可以通过调用它的方法来获取y关于的梯度。接下来，我们可以通过的属性访问渐变。xbackwardxgrad

							y.backward()
x.grad

							tensor([ 0., 4., 8., 12.])

						

							# Our code is inside an `autograd.record` scope to build the computational
# graph
with autograd.record():
  y = 2 * np.dot(x, x)
y

							 

							array(28.)

						

We can now take the gradient of y with respect to x by calling its backward method. Next, we can access the gradient via x’s grad attribute.

							y.backward()
x.grad

							[09:38:36] src/base.cc:49: GPU context requested, but no GPUs found.

						

							array([ 0., 4., 8., 12.])

						

							y = lambda x: 2 * jnp.dot(x, x)
y(x)

							Array(28., dtype=float32)

						

We can now take the gradient of y with respect to x by passing through the grad transform.

							from jax import grad

# The `grad` transform returns a Python function that
# computes the gradient of the original function
x_grad = grad(y)(x)
x_grad

							 

							Array([ 0., 4., 8., 12.], dtype=float32)

						

							# Record all computations onto a tape
with tf.GradientTape() as t:
  y = 2 * tf.tensordot(x, x, axes=1)
y

							 

							<tf.Tensor: shape=(), dtype=float32, numpy=28.0>

						

We can now calculate the gradient of y with respect to x by calling the gradient method.

							x_grad = t.gradient(y, x)
x_grad

							<tf.Tensor: shape=(4,), dtype=float32, numpy=array([ 
						

PyTorch教程2.5之自动微分

2.5.1. 一个简单的函数

PyTorch教程21.3之矩阵分解

PyTorch教程22.6之随机变量

PyTorch教程23.1之使用Jupyter笔记本

PyTorch教程23.4之使用Google Colab

PyTorch教程23.2之使用亚马逊SageMaker

PyTorch教程23.8之API

PyTorch教程4.1之Softmax回归

PyTorch教程3.6之概括

PyTorch教程4.7之环境与分配转变

PyTorch教程6.2之参数管理

PyTorch教程6.1之层和模块

PyTorch教程10.8之波束搜索

PyTorch教程12.1之优化和深度学习

PyTorch教程12.2之凸度

PyTorch教程13.4之硬件

PyTorch教程13.3之自动并行

PyTorch教程13.2之异步计算

PyTorch教程14.2之微调

PyTorch教程14.1之图像增强

PyTorch教程6.7之显卡

PyTorch教程2.3之线性代数

PyTorch教程3.1之线性回归

PyTorch教程2.6之概率统计

PyTorch教程14.4之锚箱

PyTorch教程14.10之转置卷积

PyTorch教程21.1之推荐系统概述

PyTorch教程7.3之填充和步幅

PyTorch教程7.2之图像卷积

PyTorch教程8.2之使用块的网络(VGG)

Multisim仿真之微分积分威廉希尔官方网站

利用Arm Kleidi技术实现PyTorch优化

如何使用 PyTorch 进行强化学习

使用PyTorch在英特尔独立显卡上训练模型

pytorch怎么在pycharm中运行

PyTorch的介绍与使用案例

tensorflow和pytorch哪个更简单?

PyTorch的特性和使用方法

如何使用PyTorch建立网络模型

控制算法PID之微分控制（D）的原理和示例代码

基于PyTorch AMD的解决方案

使用PyTorch加速图像分割

pytorch用来干嘛的

深度学习框架pytorch介绍

深度学习框架pytorch入门与实践

什么是微分代数方程？Matlab求解微分代数方程

PyTorch教程-2.5. 自动微分

模拟威廉希尔官方网站 之积分威廉希尔官方网站 与微分威廉希尔官方网站

微分放大威廉希尔官方网站 的设计及计算例题

PyTorch 的 Autograd 机制和使用

PyTorch1.8和Tensorflow2.5该如何选择？

基于PyTorch的深度学习入门教程之PyTorch的自动梯度计算

基于PyTorch的深度学习入门教程之PyTorch简单知识

基于PyTorch的深度学习入门教程之PyTorch重点综合实践

基于PyTorch的深度学习入门教程之使用PyTorch构建一个神经网络

一篇非常新的介绍PyTorch内部机制的文章

Pytorch 1.1.0，来了！

一文解构PyTorch：深入了解PyTorch内部机制

Facebook宣布发布深度学习框架 PyTorch 1.0开发者预览版

基于图像的微分的：一阶微分和二阶微分（拉普拉斯算子）

RC微分威廉希尔官方网站 的作用_RC微分威廉希尔官方网站 原理

下载排行榜

UC3842/3/4/5电源管理芯片中文手册

DMT0660数字万用表产品说明书

STM32F101x8/STM32F101xB手册

华瑞昇CR216芯片数字万用表规格书附原理图及校正流程方法

TPS54202H降压转换器评估模块用户指南

HY12P65/HY12P66数字万用表芯片规格书

模拟威廉希尔官方网站之积分威廉希尔官方网站与微分威廉希尔官方网站

微分放大威廉希尔官方网站的设计及计算例题

RC微分威廉希尔官方网站的作用_RC微分威廉希尔官方网站原理