PyTorch_Practice_1
1 OverView
Goal of this tutorial
- How to implement learning system using PyTorch
- Understand the basic of neural networks deep learning
Requirements
- Algebra Probability
- Python
Human Intelligence
Infer 推理
What to eat for dinner? 决策。根据已有信息(经济,个人偏好)推理。
data:image/s3,"s3://crabby-images/82cba/82cbab1f661fc816ec636385ae167045944130d3" alt="image-20230730142434853"
Prediction 预测
实体 -> 抽象
data:image/s3,"s3://crabby-images/3d199/3d199257feeb84898383fe5f3c8e79691b9fbeb0" alt="image-20230730142756467"
data:image/s3,"s3://crabby-images/806c7/806c798b386e88ff3c98d1b5d3056a8b4f71036d" alt="image-20230730142807343"
Machine Learning
使用算法推理或预测
data:image/s3,"s3://crabby-images/a72de/a72de47693a3e0ac08037984e04731334c85fea6" alt="image-20230730142950891"
data:image/s3,"s3://crabby-images/7aafd/7aafd55716c13f35dd70fea134b4bbc053e447e9" alt="image-20230730143053679"
监督学习
使用标签数据集训练模型
data:image/s3,"s3://crabby-images/5dda9/5dda9c12dc0a088eaa10940638204e61296f97d7" alt="image-20230730143134256"
常规算法:
- 穷举法
- 贪心法
- 分治法
- 动态规划
机器学习的算法:
利用数据集找出算法
Deep Learning
MLP 多层感知机
data:image/s3,"s3://crabby-images/27c73/27c734f7ca4df97ad032f538c692ada800a65c01" alt="image-20230730143713331"
How to develop learning system?
基于规则的系统
手工设计程序,规则会越来越多,不利于维护
data:image/s3,"s3://crabby-images/cb889/cb8890f8a72230bb75bfc2e565d5a465db297f07" alt="image-20230730144243250"
示例:求原函数
- 构造知识库(函数的原函数)
- 定义规则
- 三角变换
经典机器学习
手工进行特征提取:将输入变为向量。
建立向量和输出的映射:y=f(x)
data:image/s3,"s3://crabby-images/c7d63/c7d63147b6cd412ee9cde327b8d32b4477137962" alt="image-20230730144600338"
表示学习
data:image/s3,"s3://crabby-images/140e2/140e28d1df23c58cb47ed863cd5147e4508a1656" alt="image-20230730145357381"
维度诅咒:输入的特征数越多,即维度越高,需要采样的数量越多。
data:image/s3,"s3://crabby-images/0e344/0e344ee4cb873626e7f59462ee2d84cd1f1392fa" alt="image-20230730144848619"
线性映射:N维映射到3维
data:image/s3,"s3://crabby-images/66900/6690097dcebdd0200787612b04cab55e01a4d8cc" alt="image-20230730145120814"
Manifold 流形
示例:银河系
三维映射到二维
data:image/s3,"s3://crabby-images/851da/851da119d98722b995ce85ba2fc1467e5299c02e" alt="image-20230730145342647"
深度学习
特征更简单。需要额外的层提取特征
data:image/s3,"s3://crabby-images/adb6f/adb6f9afb1ecff841fd7825a1e8d3831c0145434" alt="image-20230730145525280"
Rule-based system VS Representation Learning
data:image/s3,"s3://crabby-images/ac81d/ac81d8d2f6dc6c503e2c50d47abe6a6b59ea6fa0" alt="image-20230730145938708"
Traditional machine learning strategy
分类,聚类,回归,降维
data:image/s3,"s3://crabby-images/e913a/e913a5f1004074f7963469888948bbb63415dc81" alt="image-20230730150024271"
New challenge
- Limit of hand-designed feature. 人工设计的特征限制
- SVM can not handle big data set well. 大数据
- More and more application need to handle unstructured data. 无结构数据(图像,文本,声音)
data:image/s3,"s3://crabby-images/a786a/a786a92ccfd7c0d8e2730ae5918a83b22ef15e13" alt="image-20230730150428109"
Brief history of neural networks
From neuroscience to mathematic & engineering
Back Propagation 反向传播
- 计算图可以传播导数
- 链式法则求偏导
- b的导数需要把所有路径的偏导数相加
data:image/s3,"s3://crabby-images/df6b7/df6b7c1968d92dc2ca0a9f09b74ee34998b75e91" alt="image-20230730151352342"
data:image/s3,"s3://crabby-images/44f1b/44f1b1c46003351b0218c936434963dda08ee5e3" alt="image-20230730151439223"
data:image/s3,"s3://crabby-images/77cab/77cab08f7606c3088333a20f23f6d1f2830a77c5" alt="image-20230730151532329"
2 Linear Model
Machine Learning
data:image/s3,"s3://crabby-images/9cffa/9cffa3c1bf70006d2f565248ffa99cf76191cdd9" alt="image-20230730153954181"
Supervised Learning
训练时知道x和y
data:image/s3,"s3://crabby-images/04895/04895cee971900a7adffbd55e86a3baf64cc2a10" alt="image-20230730154226138"
过拟合:训练时误差很小,背景和噪声也学到了(不希望)。
泛化:对于没见过的图像也能正确识别
数据集划分
data:image/s3,"s3://crabby-images/0f416/0f4163250d4a54f86e87ff39791ffdcbdeadc742" alt="image-20230730162544297"
Model design
What would be the best model for the data?
Linear model?
data:image/s3,"s3://crabby-images/f61d4/f61d4fd7009ab80b0b33421982ba91ef217ac677" alt="image-20230730163018247"
To simplify the model
data:image/s3,"s3://crabby-images/8ff86/8ff8676e3617b2bf4e9b3df0eb0b19e90cc33086" alt="image-20230730163737406"
Linear Regression
data:image/s3,"s3://crabby-images/a1c7e/a1c7e56bc02ad98da7979bb3ea4c4ee25b91f553" alt="image-20230730163841407"
data:image/s3,"s3://crabby-images/d8f49/d8f49b50996463f8d289933af2fb5718037799b3" alt="image-20230730164143505"
Compute Loss
data:image/s3,"s3://crabby-images/200b8/200b872e595691b312a73eeb950bc1f2d1d48b2c" alt="image-20230730164430801"
Loss function & Cost function
data:image/s3,"s3://crabby-images/d130b/d130b3bc2968d5a69157ebd186966755051ce12c" alt="image-20230730164741291"
Compute Cost
data:image/s3,"s3://crabby-images/f2f69/f2f69c392e03af3192e7ad23e30cde15bd961bfa" alt="image-20230730164811433"
穷举找到最好的权重
穷举w求出损失函数曲线,找到最低点
data:image/s3,"s3://crabby-images/f92bf/f92bfd28ffb090ba82442956f6f710fbaa7ce58d" alt="image-20230730164956525"
How to draw the graph
data:image/s3,"s3://crabby-images/23376/2337646f312f9cc358ce638f7d30bbdd09a3d6c8" alt="image-20230730165102750"
data:image/s3,"s3://crabby-images/549d6/549d681fc7739ac78030981faad6183885a2b672" alt="image-20230730165122542"
data:image/s3,"s3://crabby-images/c9d05/c9d057085940020b529dd67dda39cbef0a5e3f47" alt="image-20230730165154163"
data:image/s3,"s3://crabby-images/5cf06/5cf06dc4ba7bc5d7127a06aac175ba35ba68122d" alt="image-20230730165211232"
data:image/s3,"s3://crabby-images/495e3/495e3479f31a17f152f697f48513fc76383f7c9a" alt="image-20230730165252969"
data:image/s3,"s3://crabby-images/59e32/59e32354b5014f968117356a785483c8b4d62dd9" alt="image-20230730165321237"
此处没有求均值,只是求和
data:image/s3,"s3://crabby-images/38bf9/38bf9e40abb92f89b23207caacff2b66cc666c8a" alt="image-20230730165448085"
求均值得到MSE
data:image/s3,"s3://crabby-images/347d7/347d7f947e0302fdc72d95a7872c24bd9f2c9d9b" alt="image-20230730165511957"
data:image/s3,"s3://crabby-images/2c9ec/2c9ec88a0a6be204454b7534583436558d5a000f" alt="image-20230730165629240"
3 Gradient Descent
分治思想
先进行稀疏的搜索,先找16个点
data:image/s3,"s3://crabby-images/dd53a/dd53a749fd4043ae677a74129043bbd1acee6e35" alt="image-20230730172042768"
也不行,因为实际cost function函数可能有多个局部最小值
data:image/s3,"s3://crabby-images/72ac1/72ac18f9dcc5b29ae916bc8956a2a369d02c37dc" alt="image-20230730172127573"
凸函数:连接任意两点,线段上的点都在定义域内的函数上方。
data:image/s3,"s3://crabby-images/0947b/0947bf811ffafe879a7e1bf968309eea4501a19c" alt="image-20230730172206692"
Optimization Problem
data:image/s3,"s3://crabby-images/afa3e/afa3e94d6f423fca3c6af20048c7944ee691c056" alt="image-20230730172328444"
Gradient Descent Algorithm
贪心思想
更新时梯度取反
data:image/s3,"s3://crabby-images/7fade/7fade96cee3862f4c53d6086fe497fca85c66162" alt="image-20230730172801477"
非凸函数
任意两点划一条线,不能保证线段上的点都在对应定义域的函数上方。
使用梯度下降算法只能保证找到局部最优点,不能保证找到全局最优点
data:image/s3,"s3://crabby-images/ebe0e/ebe0ed3090d85903fc5108040b1fb041b823bf77" alt="image-20230730173116492"
鞍点
梯度等于0,到达鞍点参数更新后还是原值。
data:image/s3,"s3://crabby-images/8d903/8d9030de823259d19ac81581614688eef8647c0c" alt="image-20230730173400310"
data:image/s3,"s3://crabby-images/2a8d0/2a8d03483407b58252918393861b31a8bd65d643" alt="image-20230730173754071"
Implementation
data:image/s3,"s3://crabby-images/a61e4/a61e4e0fab915b821067f8a5b013888aa5b24771" alt="image-20230730173909588"
data:image/s3,"s3://crabby-images/d05c4/d05c4005ef6583069cfe908b4acb440fe5243d63" alt="image-20230730173922793"
data:image/s3,"s3://crabby-images/7b97b/7b97b912c3ad4e434590406b3c3160f33939bc21" alt="image-20230730173938890"
data:image/s3,"s3://crabby-images/a90a4/a90a46225059844d0139afa715228087c488adcb" alt="image-20230730174042028"
data:image/s3,"s3://crabby-images/7753f/7753f57c10140ced5cf1b8cf7ade9462e33a2c20" alt="image-20230730174105837"
data:image/s3,"s3://crabby-images/9b8c6/9b8c628ba2e772036cfb3a0f5b4436e91962a6d9" alt="image-20230730174149032"
data:image/s3,"s3://crabby-images/6bf9b/6bf9bbbd7a6c1d99011204f6a99834522a239545" alt="image-20230730174354104"
指数加权均值
使函数更平滑
data:image/s3,"s3://crabby-images/6071d/6071dd619d31274552b2f37807e06046a4caf2c5" alt="image-20230730174649178"
训练发散可能原因
学习率太大
Stochastic Gradient Descent
随机选择一个样本的损失函数对权重求导
data:image/s3,"s3://crabby-images/55943/559439b7daae3a44ed68a90d82f67d4cf1d77390" alt="image-20230730175043472"
使用随机梯度下降原因:
单个样本有随机噪声,能够跨越鞍点
Implementation of SGD
data:image/s3,"s3://crabby-images/a0c67/a0c6764c1ef765b3ac18bb3be098c82739620251" alt="image-20230730175158377"
data:image/s3,"s3://crabby-images/233f9/233f9871162ee45eaf5e9e009b879cb2a1bb7ca3" alt="image-20230730175211466"
data:image/s3,"s3://crabby-images/66383/6638306290b244e0dbae6fdaa931a74ef1e07ccb" alt="image-20230730175258213"
问题
梯度下降可以并行计算,随机梯度下降不能并行,因为下一次的权重更新需要上一个样本更新后的权重。
data:image/s3,"s3://crabby-images/bdfaf/bdfaf796d4fb872779d43fa7ac41ae17e430953e" alt="image-20230730175747932"
折中方法:Mini-Batch,小批量随机梯度下降算法
每次用一组样本计算梯度然后更新权重
data:image/s3,"s3://crabby-images/1845d/1845d0b379e2d15cf92c183f59f13997ed48f28b" alt="image-20230730180036885"
4 Back Propagation
Compute gradient in simple network
data:image/s3,"s3://crabby-images/0a291/0a291d1d3151999c832318114515b1e7c09b22f7" alt="image-20230730195406535"
What about the complicated network?
参数很多解析式计算复杂
data:image/s3,"s3://crabby-images/0a7ab/0a7abe548aec67526d387cbe64c6695ef65c20b6" alt="image-20230730195632279"
Computational Graph
data:image/s3,"s3://crabby-images/f1acd/f1acdf2e7c4f89474825e1e48869ae289933b391" alt="image-20230730200053893"
data:image/s3,"s3://crabby-images/1e186/1e18643cbe689e54d48b7198e718e50dff064e1b" alt="image-20230730200148115"
What problem about two layer neural network?
线性变换会化简,模型复杂度无法提升
data:image/s3,"s3://crabby-images/2801b/2801b14ef0ca2270d714dff172a974fb7e6325c1" alt="image-20230730200416988"
添加非线性函数
data:image/s3,"s3://crabby-images/3addf/3addfce8dff44a5bee352f4e086edb6b58c8bf93" alt="image-20230730200559282"
The composition of functions and Chain Rule
data:image/s3,"s3://crabby-images/41632/416328b9d54c901c966c3c0d7ede747ff2825717" alt="image-20230730200720960"
Chain Rule
1.Create Computional Graph(Forward)
data:image/s3,"s3://crabby-images/4c40e/4c40e5cc720c0fbf199924f67cc840d6da42aaa1" alt="image-20230730200838805"
2.Local Gradient 局部梯度
data:image/s3,"s3://crabby-images/f01d3/f01d37dd4f38e2486b5eae0c12dca4d719a5990d" alt="image-20230730200908028"
3.Given gradient from succesive node 连续节点
data:image/s3,"s3://crabby-images/f986c/f986c2da4f04f300cc3a49ba44896e3807db62f5" alt="image-20230730201133808"
4. Use chain rule to compute the gradient
data:image/s3,"s3://crabby-images/b8261/b826187b6445f1e73b074d0b1444549f347f6ede" alt="image-20230730201220910"
Examplle
data:image/s3,"s3://crabby-images/35be2/35be2f828b9d120175b545655a218061593dea25" alt="image-20230730201348872"
data:image/s3,"s3://crabby-images/4828d/4828dfa753e685773b4bc2beb3421c1346b9e1fc" alt="image-20230730201431096"
Computational Graph of Linear Model
residual 残差项 r = y_hat - y
data:image/s3,"s3://crabby-images/4b522/4b522c5690c49f8300d2cb388b7648fca2b967da" alt="image-20230730202056353"
有偏置时也要计算b的梯度
data:image/s3,"s3://crabby-images/cb7f5/cb7f59ed4c16187b15f607b1065b4bba7d9f45cf" alt="image-20230730202305630"
Tensor in PyTorch
- In PyTorch, Tensor is the important component in constructing dynamic computational graph.
- It contains data and grad,which storage the value of node and gradient w.r.t(with respect to) loss respectively.
data:image/s3,"s3://crabby-images/6d88e/6d88e581c65478b593edab0c950241cf1c5d6549" alt="image-20230730202548110"
Implementation of linear model with PyTorch
1 |
|
If autograd mechanics are required,the element variable requires_grad of Tensor has to be set to True.
Define the linear model and loss function
1 |
|
data:image/s3,"s3://crabby-images/bbd7a/bbd7a4a9e55763d9afd15e6354754e91d9d28041" alt="image-20230730202918889"
data:image/s3,"s3://crabby-images/39d94/39d94283495b18b69795e78b335a2c98d0fd087f" alt="image-20230730203201806"
训练过程
1 |
|
1 |
|
data:image/s3,"s3://crabby-images/b0dbf/b0dbf4a76ea7a7e16d12f1874997a728570fded8" alt="image-20230730210510986"
Forward in PyTorch
data:image/s3,"s3://crabby-images/2f79f/2f79f8fbd670b3c53369095585a44b04903abd2d" alt="image-20230730210639060"
Backward in PyTorch
data:image/s3,"s3://crabby-images/34346/34346d7c8681077c037a30effe8485e2f59003e6" alt="image-20230730210723319"
Update weight in PyTorch
data:image/s3,"s3://crabby-images/ce62c/ce62ca5e46e4658126dc926b85c3fea456c415e4" alt="image-20230730210748563"
5 Linear Regression with PyTorch
PyTorch Fashion
data:image/s3,"s3://crabby-images/39ffe/39ffe9949db42c93db11ef19a63a126fe1cb37db" alt="image-20230730214020125"
Prepare dataset
data:image/s3,"s3://crabby-images/2086e/2086e26919007f3d161b3ced41099324d4daba85" alt="image-20230730214757663"
广播机制
矩阵加法时扩充矩阵
data:image/s3,"s3://crabby-images/8415c/8415cbe54205bae86bc49008dff449b4b0261f9f" alt="image-20230730214337060"
data:image/s3,"s3://crabby-images/f80ff/f80ffb43f9210ef1325b9c680faf163726013d21" alt="image-20230730214506696"
data:image/s3,"s3://crabby-images/9ed25/9ed25a3d857102b08ec061a738e54e75639daad1" alt="image-20230730214708765"
data:image/s3,"s3://crabby-images/03532/0353274a7a3a72571985e9a24fd77408da6ccfcf" alt="image-20230730214729480"
data:image/s3,"s3://crabby-images/3c445/3c44561b506d03d942e64d2c84fd9981529b5daf" alt="image-20230730215522431"
Design model
构造计算图
Affine model 仿射模型/线性单元
需要知道x的维度和y的维度,来确定w和b的维度
data:image/s3,"s3://crabby-images/d2093/d2093843bfd951b31dfe8c423dc9a723fc21065e" alt="image-20230730215325358"
1 |
|
torch.nn.Linear(1, 1)
构造对象,括号内是权重和偏置
data:image/s3,"s3://crabby-images/497a1/497a14b3850c28ef1e793cb1e4ee74b6786b7093" alt="image-20230730220532687"
行表示样本数,列表示每一个样本的维度
- 二维张量可以表示为 [[1, 2, 3], [4, 5, 6]],其中有两个维度,每个维度包含三个元素。所以该张量的维度是 (2, 3)。
- 样本的维度可以等于样本向量的元素个数。
in_features指输入的维度
data:image/s3,"s3://crabby-images/7015e/7015e32cc2e48f8e1aef9e33b2a421cf7ca97ee2" alt="image-20230730221032174"
data:image/s3,"s3://crabby-images/1330a/1330a93e4482c2d1e1952fc1fbc0ec0feb51d108" alt="image-20230730221127138"
可调用对象
1 |
|
Construct Loss and Optimizer
1 |
|
data:image/s3,"s3://crabby-images/fe9c5/fe9c5aa29894835147053f4fb3e6e6d14af015b9" alt="image-20230731095925453"
data:image/s3,"s3://crabby-images/5938c/5938cecbc943eba9ceb2bb9b6b08f1c2e0931d9a" alt="image-20230731100535397"
data:image/s3,"s3://crabby-images/8bdb4/8bdb43b64a308a92b545ea5a435c2afaaae50912" alt="image-20230731100547121"
Training Cycle
1 |
|
data:image/s3,"s3://crabby-images/13ec3/13ec3fce984f8e7aa4f41d2d0aef4640cbb9d4ca" alt="image-20230731101009568"
Test Model
1 |
|
data:image/s3,"s3://crabby-images/38204/382044d5aa3bd1391bf4fa4c6e2a3d801fb64e6e" alt="image-20230731101308097"
data:image/s3,"s3://crabby-images/1f4ea/1f4ea043c8b6958d5af771e0972e60a04f15c4bd" alt="image-20230731101337662"
Linear Regression
data:image/s3,"s3://crabby-images/f696a/f696a43be8776a6a16e95a34aae958a4cf49984b" alt="image-20230731101559182"
6 Logistic Regression
Classification - The MNIST Dataset
分裂问题,计算每个类别的概率,找出最大值
data:image/s3,"s3://crabby-images/fec6f/fec6f1b4c51123f57a93ff0fe3e3cfe532c361f3" alt="image-20230731105257387"
1 |
|
Classification - The CIFAR-10 dataset
data:image/s3,"s3://crabby-images/637e6/637e6d74247a40b42e3a5464f50c331c680e465c" alt="image-20230731110043063"
Regression vs Classfication
data:image/s3,"s3://crabby-images/cf382/cf382a9617adf2ae68df4286443753877e7516d0" alt="image-20230731112122200"
How to map: R ->[0,1 ]
将实数值映射到[0, 1]
Logistic Function
data:image/s3,"s3://crabby-images/3a48f/3a48f9e2379e685bd8894a8d87bee8b51b2c926e" alt="image-20230731112346368"
饱和函数:超过某个范围,导数的越来越小
data:image/s3,"s3://crabby-images/0f530/0f530240fe8a03a2610749a491bbb593c93e780f" alt="image-20230731112611568"
Sigmoid functions
都有极限,单调递增,满足饱和函数
data:image/s3,"s3://crabby-images/c27d9/c27d9f4852908a3337065f034280d87bf4699278" alt="image-20230731112757873"
Logistic Regression Model
data:image/s3,"s3://crabby-images/615fe/615fe958956e93e7dbd96f1d299342648c1d6d36" alt="image-20230731113023172"
Loss function for Binary Classification
data:image/s3,"s3://crabby-images/ac01d/ac01dfadeacf461a2afad3625fa52378f16827a2" alt="image-20230731113842382"
cross-entropy 交叉熵
表示两个分布的差异大小
data:image/s3,"s3://crabby-images/fb1f6/fb1f64a930bfc23c3711fb189b1ef27a76faf17b" alt="image-20230731113523984"
data:image/s3,"s3://crabby-images/85bb7/85bb77c1ff8ef271f1cb7283ff913f1130778687" alt="image-20230731113630554"
BCE Loss 二分类交叉熵损失函数
data:image/s3,"s3://crabby-images/41c00/41c00edc89d1eb7a7cff38b8d32faeaa7ebe131f" alt="image-20230731113819447"
Mini-Batch Loss function for Binary Classification
data:image/s3,"s3://crabby-images/a125c/a125c5ef107514038f25495589d72ce1ad232453" alt="image-20230731114150846"
Implementation of Logistic Regression
1 |
|
data:image/s3,"s3://crabby-images/16784/167848cd9c3d31ae678e0be6cd6bda09c346551e" alt="image-20230731114523877"
1 |
|
data:image/s3,"s3://crabby-images/29612/296123ee3a11b8d9e6161a04bedf18d2500b15d7" alt="image-20230731114703208"
总过程
data:image/s3,"s3://crabby-images/22551/2255188503b7ada20eb92210ad234de60b525e81" alt="image-20230731114848085"
Result of Logistic Regression
1 |
|
data:image/s3,"s3://crabby-images/65971/65971fa503799e742adbacd158b35676040b1775" alt="image-20230731115936459"