1 OverView

Goal of this tutorial

  • How to implement learning system using PyTorch
  • Understand the basic of neural networks deep learning

Requirements

  • Algebra Probability
  • Python

Human Intelligence

Infer 推理

What to eat for dinner? 决策。根据已有信息(经济,个人偏好)推理。

image-20230730142434853

Prediction 预测

实体 -> 抽象

image-20230730142756467 image-20230730142807343

Machine Learning

使用算法推理或预测

image-20230730142950891 image-20230730143053679

监督学习

使用标签数据集训练模型

image-20230730143134256

常规算法:

  • 穷举法
  • 贪心法
  • 分治法
  • 动态规划

机器学习的算法:

利用数据集找出算法

Deep Learning

MLP 多层感知机

image-20230730143713331

How to develop learning system?

基于规则的系统

手工设计程序,规则会越来越多,不利于维护

image-20230730144243250

示例:求原函数

  • 构造知识库(函数的原函数)
  • 定义规则
  • 三角变换

经典机器学习

手工进行特征提取:将输入变为向量。

建立向量和输出的映射:y=f(x)

image-20230730144600338

表示学习

image-20230730145357381

维度诅咒:输入的特征数越多,即维度越高,需要采样的数量越多。

image-20230730144848619

线性映射:N维映射到3维

image-20230730145120814

Manifold 流形

示例:银河系

三维映射到二维

image-20230730145342647

深度学习

特征更简单。需要额外的层提取特征

image-20230730145525280

Rule-based system VS Representation Learning

image-20230730145938708

Traditional machine learning strategy

分类,聚类,回归,降维

image-20230730150024271

New challenge

  • Limit of hand-designed feature. 人工设计的特征限制
  • SVM can not handle big data set well. 大数据
  • More and more application need to handle unstructured data. 无结构数据(图像,文本,声音)
image-20230730150428109

Brief history of neural networks

From neuroscience to mathematic & engineering

Back Propagation 反向传播

  • 计算图可以传播导数
  • 链式法则求偏导
  • b的导数需要把所有路径的偏导数相加
image-20230730151352342 image-20230730151439223 image-20230730151532329

2 Linear Model

Machine Learning

image-20230730153954181

Supervised Learning

训练时知道x和y

image-20230730154226138

过拟合:训练时误差很小,背景和噪声也学到了(不希望)。

泛化:对于没见过的图像也能正确识别

数据集划分

image-20230730162544297

Model design

  • What would be the best model for the data?

  • Linear model?

image-20230730163018247

To simplify the model

image-20230730163737406

Linear Regression

image-20230730163841407 image-20230730164143505

Compute Loss

image-20230730164430801

Loss function & Cost function

image-20230730164741291

Compute Cost

image-20230730164811433

穷举找到最好的权重

穷举w求出损失函数曲线,找到最低点

image-20230730164956525

How to draw the graph

image-20230730165102750 image-20230730165122542 image-20230730165154163 image-20230730165211232 image-20230730165252969 image-20230730165321237

此处没有求均值,只是求和

image-20230730165448085

求均值得到MSE

image-20230730165511957 image-20230730165629240

3 Gradient Descent

分治思想

先进行稀疏的搜索,先找16个点

image-20230730172042768

也不行,因为实际cost function函数可能有多个局部最小值

image-20230730172127573

凸函数:连接任意两点,线段上的点都在定义域内的函数上方。

image-20230730172206692

Optimization Problem

image-20230730172328444

Gradient Descent Algorithm

贪心思想

更新时梯度取反

image-20230730172801477

非凸函数

任意两点划一条线,不能保证线段上的点都在对应定义域的函数上方。

使用梯度下降算法只能保证找到局部最优点,不能保证找到全局最优点

image-20230730173116492

鞍点

梯度等于0,到达鞍点参数更新后还是原值。

image-20230730173400310 image-20230730173754071

Implementation

image-20230730173909588 image-20230730173922793 image-20230730173938890 image-20230730174042028 image-20230730174105837 image-20230730174149032 image-20230730174354104

指数加权均值

使函数更平滑

image-20230730174649178

训练发散可能原因

学习率太大

Stochastic Gradient Descent

随机选择一个样本的损失函数对权重求导

image-20230730175043472

使用随机梯度下降原因:

单个样本有随机噪声,能够跨越鞍点

image-20230730175028462

Implementation of SGD

image-20230730175158377 image-20230730175211466 image-20230730175258213

问题

梯度下降可以并行计算,随机梯度下降不能并行,因为下一次的权重更新需要上一个样本更新后的权重。

image-20230730175747932

折中方法:Mini-Batch,小批量随机梯度下降算法

每次用一组样本计算梯度然后更新权重

image-20230730180036885

4 Back Propagation

Compute gradient in simple network

image-20230730195406535

What about the complicated network?

参数很多解析式计算复杂

image-20230730195632279

Computational Graph

image-20230730200053893 image-20230730200148115

What problem about two layer neural network?

线性变换会化简,模型复杂度无法提升

image-20230730200416988

添加非线性函数

image-20230730200559282

The composition of functions and Chain Rule

image-20230730200720960

Chain Rule

1.Create Computional Graph(Forward)

image-20230730200838805

2.Local Gradient 局部梯度

image-20230730200908028

3.Given gradient from succesive node 连续节点

image-20230730201133808

4. Use chain rule to compute the gradient

image-20230730201220910

Examplle

image-20230730201348872 image-20230730201431096

Computational Graph of Linear Model

residual 残差项 r = y_hat - y

image-20230730202056353

有偏置时也要计算b的梯度

image-20230730202305630

Tensor in PyTorch

  • In PyTorch, Tensor is the important component in constructing dynamic computational graph.
  • It contains data and grad,which storage the value of node and gradient w.r.t(with respect to) loss respectively.
image-20230730202548110

Implementation of linear model with PyTorch

1
2
3
4
5
6
7
import torch
x_data = [1.0, 2.0, 3.0]
y_data = [2.0, 4.0, 6.0]

w = torch.Tensor([1.0])
# 需要计算梯度
w.requires_grad = True

If autograd mechanics are required,the element variable requires_grad of Tensor has to be set to True.

Define the linear model and loss function

1
2
3
4
5
6
7
# 此处乘法运算,x需要自动类型转换为tensor,然后进行tensor之间的运算
def forward(x):
return x * w

def loss(x, y):
y_pred = forward(x)
return (y_pred - y) ** 2
image-20230730202918889 image-20230730203201806

训练过程

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
# 4是作为参数传递给 forward() 函数的值,即x=4
print("predict (before training)", 4, forward(d).item())

for epoch in range(100):
for x, y in zip(x_data, y_data):
# Forward, compute the loss
l = loss(x, y)
# Backward,compute grad for Tensor whose requires_grad
set to True
l.backward()
print('\tgrad:', x, y, w.grad.item())
# The grad is utilized to update weight.
#grad也是张量,也需要取data。grad.data结果是张量的拷贝,但不会建立计算图
# grad.item()取的是grad的值,是标量
w.data = w.data - 0.01 * w.grad.data
#NOTICE:The grad computed by .backward()will be xaccumulated.So after pdate,remember set the grad to ZERO!!!
w.grad.dataocr.zero_()
print("progress:", epoch, l.item())
print("predict(after training)", 4, forward(4).item())
1
2
3
# tensor进行计算会构建计算图,计算图会占用内存
sum += l # 不对
sum += l.item() # 损失是张量,应该取数值
image-20230730210510986

Forward in PyTorch

image-20230730210639060

Backward in PyTorch

image-20230730210723319

Update weight in PyTorch

image-20230730210748563

5 Linear Regression with PyTorch

PyTorch Fashion

image-20230730214020125

Prepare dataset

image-20230730214757663

广播机制

矩阵加法时扩充矩阵

image-20230730214337060 image-20230730214506696 image-20230730214708765 image-20230730214729480 image-20230730215522431

Design model

构造计算图

Affine model 仿射模型/线性单元

需要知道x的维度和y的维度,来确定w和b的维度

image-20230730215325358
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
# Our model class should be inherit from nn.Module,which is Base class for all neural network modules.
class LinearModel(torch.nn.Module):
# Member methods __init__() and forward() have to be implemented
# 构造函数
def __init__(self):
# 父类构造函数
supper(LinearModel, self).__init__()
self.linear = torch.nn.Linear(1, 1)

def forward(self, x):
# Class nn.Linear has implemented the magic method call (which enable the instance of the class can be called just like a function.Normally the forwardo will be called. Pythonic!!!
y_pred = self.linear(x)
return y_pred
# Create a instance of class LinearModel
model = LinearModel()

torch.nn.Linear(1, 1)

构造对象,括号内是权重和偏置

image-20230730220532687

行表示样本数,列表示每一个样本的维度

  • 二维张量可以表示为 [[1, 2, 3], [4, 5, 6]],其中有两个维度,每个维度包含三个元素。所以该张量的维度是 (2, 3)。
  • 样本的维度可以等于样本向量的元素个数。

in_features指输入的维度

image-20230730221032174 image-20230730221127138

可调用对象

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
class Foobar:
def __init__(self):
pass
# *args指数量未知的常量
# **kwargs指数量未知的变量
def __call__(self, *args, **kwargs):
pass

def func(a, b, c, x, y):
#可以写成
def func(*args, **kwargs):
print(args) # 元组 (1, 2, 4, 3)
print(kwargs) # 词典 {'x':3, 'y':5}

func(1, 2, 4, 3, x=3, y=5)
# 举例
def __call__(self, *args, **kwargs):
print("Hello" + str(args[0]))

foobar = Foobar()
foobar(1, 2, 3) # 打印出 Hello1
# __call()函数内实现了forward()
def __call__(self, *args, **kwargs):
forward(self,)

Construct Loss and Optimizer

1
2
3
4
5
6
# MSELoss 继承自nn.Module
# 参数是y_hat和 y
criterion = torch.nn.MSELoss(size_average=False)
# 不继承自nn.Module,不会构建计算图
# model内部的Linear类有权重,父类的成员函数parameters()检查内部成员的所有权重。
optimizer = torch.optim.SGD(model.parameters(), lr=0.01)
image-20230731095925453 image-20230731100535397 image-20230731100547121

Training Cycle

1
2
3
4
5
6
7
8
9
10
11
12
13
for epoch in range(100):
# Foward:Predict
y_pred = model(x_data)
# Forward:Loss
loss = criterion(y_pred, y_data)
# print会自动调用loss的__str__(),不会产生计算图
print(epoch, loss)
# NOTICE: The grad computed by backward(will be accumulated.So before backward,remember set the grad to ZERO!!!
optimizer.zero_grad()
# Backward:Autograd
loss.backward()
# Update
optimizer.step()
image-20230731101009568

Test Model

1
2
3
4
5
6
7
# Output weight and bias
print('w=', model.linear.weight.item())
print('b=', model.linear.bias.item())
# Test Model
x_test = torch.dTensor([[4.0]])
y_test = model(x_test)
print('y_pred=', y_test.data)
image-20230731101308097 image-20230731101337662

Linear Regression

image-20230731101559182

6 Logistic Regression

Classification - The MNIST Dataset

分裂问题,计算每个类别的概率,找出最大值

image-20230731105257387
1
2
3
4
# 工具包
import torchvision
train_set = torchvision.dataset.MNIST(root='../dataset/mnist', train=True, download=True)
test_set = torchvision.dataset.MNIST(root='../dataset/mnist', train=False, download=True)

Classification - The CIFAR-10 dataset

image-20230731110043063

Regression vs Classfication

image-20230731112122200

How to map: R ->[0,1 ]

将实数值映射到[0, 1]

Logistic Function

image-20230731112346368

饱和函数:超过某个范围,导数的越来越小

image-20230731112611568

Sigmoid functions

都有极限,单调递增,满足饱和函数

image-20230731112757873

Logistic Regression Model

image-20230731113023172

Loss function for Binary Classification

image-20230731113842382

cross-entropy 交叉熵

表示两个分布的差异大小

image-20230731113523984 image-20230731113630554

BCE Loss 二分类交叉熵损失函数

image-20230731113819447

Mini-Batch Loss function for Binary Classification

image-20230731114150846

Implementation of Logistic Regression

1
2
3
4
5
6
7
8
9
10
import torch.nn.functional as F

class LogisticRegression(torch.nn.Module):
def __init__(self):
super(LogisticRegressionModel, self).__init__()
self.linear = torch.nn.Linear(1, 1)
def forward(self, x):
# 在线性单元基础上加一个sigmoid函数
y_pred = F,sigmoid(self.linear(x))
return y_pred
image-20230731114523877
1
2
# MSE换成BCE
criterion = torch.nn.BCELoss(size_average=False)
image-20230731114703208

总过程

image-20230731114848085

Result of Logistic Regression

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
import numpy as np
import matplotlib.pyplot as plt
# 在 0 到 10 的范围内生成一个包含 200 个等间距数值的一维数组
x = np.linspace(0, 10, 200)
# 使用 view 方法对该张量进行形状变换
# (200, 1) 指定了新的形状为一个二维张量,第一维大小为 200,第二维大小为 1。200行1列
x_t = torch.Tensor(x).view((200, 1))
y_t = model(x_t)
y = y_t.data.numpy()
plt.plot(x, y)
# c='r' 参数指定线条的颜色为红色(red)
plt.plot([0, 10], [0.5, 0.5], c='r')
plt.xlabel('Hours')
plt.ylabel('Probability of Pass')
# plt.grid() 用于在当前图形上添加网格线
plt.grid()
plt.show()
image-20230731115936459