本文最后更新于:2020年10月19日 晚上
Assignment 1: Multi-Layer Perceptron with MNIST Dataset
In this assignment, you are required to train two MLPs to classify images from the MNIST database hand-written digit database by using PyTorch.
The process will be broken down into the following steps:
- Load and visualize the data.
- Define a neural network. (30 marks)
- Train the models. (30 marks)
- Evaluate the performance of our trained models on the test dataset. (20 marks)
- Analysis your results. (20 marks)
import torch
import numpy as np
import os
os.environ['KMP_DUPLICATE_LIB_OK']='True' # 解决macos下重复初始化libiomp5.dylib问题
Load and Visualize the Data
Downloading may take a few moments, and you should see your progress as the data is loading. You may also choose to change the batch_size
if you want to load more data at a time.
This cell will create DataLoaders for each of our datasets.
from torchvision import datasets
import torchvision.transforms as transforms
# number of subprocesses to use for data loading
num_workers = 0
# how many samples per batch to load
batch_size = 20
# convert data to torch.FloatTensor
transform = transforms.ToTensor()
# choose the training and test datasets
train_data = datasets.MNIST(root='data', train=True,
download=True, transform=transform)
test_data = datasets.MNIST(root='data', train=False,
download=True, transform=transform)
# prepare data loaders
train_loader = torch.utils.data.DataLoader(train_data, batch_size=batch_size, num_workers=num_workers)
test_loader = torch.utils.data.DataLoader(test_data, batch_size=batch_size, num_workers=num_workers)
Visualize a Batch of Training Data
The first step in a classification task is to take a look at the data, make sure it is loaded in correctly, then make any initial observations about patterns in that data.
import matplotlib.pyplot as plt
%matplotlib inline
# obtain one batch of training images
dataiter = iter(train_loader)
images, labels = dataiter.next()
images = images.numpy()
# plot the images in the batch, along with the corresponding labels
fig = plt.figure(figsize=(25, 4))
for idx in np.arange(20):
ax = fig.add_subplot(2, 20/2, idx+1, xticks=[], yticks=[])
ax.imshow(np.squeeze(images[idx]), cmap='gray') # np.squeeze():从数组的形状中删除单维度条目,把shape中为1的维度去掉
# print out the correct label for each image
# .item() gets the value contained in a Tensor
ax.set_title(str(labels[idx].item()))
View an Image in More Detail
img = np.squeeze(images[1])
fig = plt.figure(figsize = (12,12))
ax = fig.add_subplot(111)
ax.imshow(img, cmap='gray')
width, height = img.shape
thresh = img.max()/2.5
for x in range(width):
for y in range(height):
val = round(img[x][y],2) if img[x][y] !=0 else 0 # round:四舍五入2位
ax.annotate(str(val), xy=(y,x),
horizontalalignment='center',
verticalalignment='center',
color='white' if img[x][y]<thresh else 'black')
Define the Network Architecture (30 marks)
- Input: a 784-dim Tensor of pixel values for each image.
- Output: a 10-dim Tensor of number of classes that indicates the class scores for an input image.
You need to implement three models:
- a vanilla multi-layer perceptron. (10 marks)
- a multi-layer perceptron with regularization (dropout or L2 or both). (10 marks)
- the corresponding loss functions and optimizers. (10 marks)
Build model_1
from torch import nn
from torch import optim
import torch
import numpy as np
from torch.nn import init
torch.manual_seed(1) # 设置随机数种子
## Define the MLP architecture
class VanillaMLP(nn.Module):
def __init__(self):
super(VanillaMLP, self).__init__()
# 手动实现全连接层
# self.w1 = nn.Parameter(torch.randn(784,hidden_features))
# self.b1 = nn.Parameter(torch.randn(hidden_features))
# self.w2 = nn.Parameter(torch.randn(hidden_features,10))
# self.b2 = nn.Parameter(torch.randn(10))
# self.relu = nn.ReLU() ,nn.BatchNorm1d(1024)
self.layer1 = nn.Sequential(nn.Linear(784, 1024),nn.ReLU())
self.layer2 = nn.Sequential(nn.Linear(1024, 512),nn.ReLU())
self.layer3 = nn.Sequential(nn.Linear(512, 256),nn.ReLU())
self.layer4 = nn.Sequential(nn.Linear(256, 100),nn.ReLU())
self.layer5 = nn.Sequential(nn.Linear(100, 10))
# init.xavier_normal_(self.layer1[0].weight) # 初始化权重为正态分布(nn.Linear默认初始化为均匀分布)
# init.xavier_normal_(self.layer2[0].weight)
def forward(self, x):
# flatten image input
x = x.view(-1, 28 * 28)
# 调用手动实现的全连接层
# x = x.mm(self.w1)
# h = x + self.b1.expand_as(x)
# h = self.relu(h) # 应用激活函数
# h = h.mm(self.w2)
# x = h + self.b2.expand_as(h)
x = self.layer1(x)
x = self.layer2(x)
x = self.layer3(x)
x = self.layer4(x)
x = self.layer5(x)
return x
# initialize the MLP
model_1 = VanillaMLP()
# specify loss function
# implement your codes here
loss1 = nn.CrossEntropyLoss()
# specify your optimizer
# implement your codes here
optimizer = optim.SGD(params=model_1.parameters(), lr=0.005)
optimizer.zero_grad() # 梯度清零
optimizer
SGD (
Parameter Group 0
dampening: 0
lr: 0.005
momentum: 0
nesterov: False
weight_decay: 0
)
Build model_2
## Define the MLP architecture
class RegularizedMLP(nn.Module):
def __init__(self):
super(RegularizedMLP, self).__init__()
# implement your codes here
#self.layer1 = nn.Sequential(nn.Linear(784, hidden_features),nn.ReLU(),nn.Dropout(0.5))
#self.layer2 = nn.Sequential(nn.Linear(hidden_features, 10))
self.layer1 = nn.Sequential(nn.Linear(784, 1024),nn.ReLU(),nn.Dropout(0.5))
self.layer2 = nn.Sequential(nn.Linear(1024, 512),nn.ReLU(),nn.Dropout(0.5))
self.layer3 = nn.Sequential(nn.Linear(512, 256),nn.ReLU(),nn.Dropout(0.5))
self.layer4 = nn.Sequential(nn.Linear(256, 100),nn.ReLU(),nn.Dropout(0.5))
self.layer5 = nn.Sequential(nn.Linear(100, 10))
def forward(self, x):
# flatten image input
x = x.view(-1, 28 * 28)
# implement your codes here
x = self.layer1(x)
x = self.layer2(x)
x = self.layer3(x)
x = self.layer4(x)
x = self.layer5(x)
return x
# initialize the MLP
model_2 = RegularizedMLP()
# specify loss function
# implement your codes here
loss2 = nn.CrossEntropyLoss()
# specify your optimizer
# implement your codes here
optimizer = optim.SGD(params=model_2.parameters(), lr=0.005)
optimizer.zero_grad() # 梯度清零
optimizer
SGD (
Parameter Group 0
dampening: 0
lr: 0.005
momentum: 0
nesterov: False
weight_decay: 0
)
Train the Network (30 marks)
Train your models in the following two cells.
The following loop trains for 30 epochs; feel free to change this number. For now, we suggest somewhere between 20-50 epochs. As you train, take a look at how the values for the training loss decrease over time. We want it to decrease while also avoiding overfitting the training data.
The key parts in the training process are left for you to implement.
Train model_1
# number of epochs to train the model
n_epochs = 20 # suggest training between 20-50 epochs
model_1.train() # prep model for training 将当前module及其子module中的所有training属性都设为True
for epoch in range(n_epochs):
# monitor training loss
train_loss = 0.0
total_correct = 0
for data, target in train_loader:
y_pre = model_1(data)
l = loss1(y_pre, target).sum()
optimizer.zero_grad()
l.backward()
optimizer.step() # 执行优化
# implement your code here
train_loss += l.item()# the total loss of this batch
total_correct += (y_pre.argmax(dim=1) == target).sum().item() # the accumulated number of correctly classified samples of this batch
# print training statistics
# calculate average loss and accuracy over an epoch
train_loss = train_loss / len(train_loader.dataset)
train_acc = 100. * total_correct / len(train_loader.dataset)
print('Epoch: {} \tTraining Loss: {:.6f} \tTraining Acc: {:.2f}%%'.format(
epoch+1,
train_loss,
train_acc
))
Epoch: 1 Training Loss: 0.113190 Training Acc: 22.09%%
Epoch: 2 Training Loss: 0.046626 Training Acc: 74.66%%
Epoch: 3 Training Loss: 0.019871 Training Acc: 88.53%%
Epoch: 4 Training Loss: 0.014905 Training Acc: 91.53%%
Epoch: 5 Training Loss: 0.011858 Training Acc: 93.19%%
Epoch: 6 Training Loss: 0.009501 Training Acc: 94.55%%
Epoch: 7 Training Loss: 0.007791 Training Acc: 95.55%%
Epoch: 8 Training Loss: 0.006547 Training Acc: 96.28%%
Epoch: 9 Training Loss: 0.005593 Training Acc: 96.83%%
Epoch: 10 Training Loss: 0.004822 Training Acc: 97.28%%
Epoch: 11 Training Loss: 0.004186 Training Acc: 97.69%%
Epoch: 12 Training Loss: 0.003650 Training Acc: 98.00%%
Epoch: 13 Training Loss: 0.003196 Training Acc: 98.28%%
Epoch: 14 Training Loss: 0.002803 Training Acc: 98.48%%
Epoch: 15 Training Loss: 0.002459 Training Acc: 98.71%%
Epoch: 16 Training Loss: 0.002155 Training Acc: 98.91%%
Epoch: 17 Training Loss: 0.001881 Training Acc: 99.08%%
Epoch: 18 Training Loss: 0.001634 Training Acc: 99.24%%
Epoch: 19 Training Loss: 0.001413 Training Acc: 99.38%%
Epoch: 20 Training Loss: 0.001215 Training Acc: 99.51%%
Train model_2
# number of epochs to train the model
n_epochs = 30 # suggest training between 20-50 epochs
model_2.train() # prep model for training
for epoch in range(n_epochs):
# monitor training loss
train_loss = 0.0
total_correct = 0
for data, target in train_loader:
y_pre = model_2(data)
l = loss2(y_pre, target).sum()
optimizer.zero_grad()
l.backward()
optimizer.step() # 执行优化
# implement your code here
train_loss += l.item()# the total loss of this batch
total_correct += (y_pre.argmax(dim=1) == target).sum().item() # the accumulated number of correctly classified samples of this batch
# print training statistics
# calculate average loss and accuracy over an epoch
train_loss = train_loss / len(train_loader.dataset)
train_acc = 100. * total_correct / len(train_loader.dataset)
print('Epoch: {} \tTraining Loss: {:.6f} \tTraining Acc: {:.2f}%%'.format(
epoch+1,
train_loss,
train_acc
))
Epoch: 1 Training Loss: 0.114623 Training Acc: 14.26%%
Epoch: 2 Training Loss: 0.089865 Training Acc: 37.94%%
Epoch: 3 Training Loss: 0.047813 Training Acc: 66.14%%
Epoch: 4 Training Loss: 0.033164 Training Acc: 79.35%%
Epoch: 5 Training Loss: 0.024138 Training Acc: 86.14%%
Epoch: 6 Training Loss: 0.018936 Training Acc: 89.40%%
Epoch: 7 Training Loss: 0.015476 Training Acc: 91.53%%
Epoch: 8 Training Loss: 0.013406 Training Acc: 92.89%%
Epoch: 9 Training Loss: 0.011899 Training Acc: 93.57%%
Epoch: 10 Training Loss: 0.010539 Training Acc: 94.33%%
Epoch: 11 Training Loss: 0.009646 Training Acc: 94.91%%
Epoch: 12 Training Loss: 0.008601 Training Acc: 95.45%%
Epoch: 13 Training Loss: 0.008043 Training Acc: 95.69%%
Epoch: 14 Training Loss: 0.007412 Training Acc: 96.11%%
Epoch: 15 Training Loss: 0.006955 Training Acc: 96.30%%
Epoch: 16 Training Loss: 0.006423 Training Acc: 96.61%%
Epoch: 17 Training Loss: 0.006151 Training Acc: 96.65%%
Epoch: 18 Training Loss: 0.005697 Training Acc: 96.97%%
Epoch: 19 Training Loss: 0.005397 Training Acc: 97.13%%
Epoch: 20 Training Loss: 0.005209 Training Acc: 97.30%%
Epoch: 21 Training Loss: 0.004915 Training Acc: 97.39%%
Epoch: 22 Training Loss: 0.004662 Training Acc: 97.48%%
Epoch: 23 Training Loss: 0.004268 Training Acc: 97.67%%
Epoch: 24 Training Loss: 0.004211 Training Acc: 97.69%%
Epoch: 25 Training Loss: 0.004022 Training Acc: 97.78%%
Epoch: 26 Training Loss: 0.003985 Training Acc: 97.87%%
Epoch: 27 Training Loss: 0.003757 Training Acc: 97.95%%
Epoch: 28 Training Loss: 0.003597 Training Acc: 98.03%%
Epoch: 29 Training Loss: 0.003284 Training Acc: 98.17%%
Epoch: 30 Training Loss: 0.003382 Training Acc: 98.12%%
Test the Trained Network (20 marks)
Test the performance of trained models on test data. Except the total test accuracy, you should calculate the accuracy for each class.
Test model_1
# initialize lists to monitor test loss and accuracy
test_loss = 0.0
class_correct = list(0. for i in range(10))
class_total = list(0. for i in range(10))
model_1.eval() # prep model for *evaluation* #把training属性都设为False,固定BN和dropout层
for data, target in test_loader:
y_pre = model_1(data)
l = loss1(y_pre, target).sum()
# implement your code here
test_loss += l.item()# the total loss of this batch
for i in range(batch_size):
if y_pre.argmax(dim=1)[i] == target[i]: # 预测正确
class_correct[target[i]] += 1 # the list of number of correctly classified samples of each class of this batch. label is the index.
class_total[target[i]] +=1 # the list of total number of samples of each class of this batch. label is the index.
# calculate and print avg test loss
test_loss = test_loss / len(test_loader.dataset)
print('Test Loss: {:.6f}\n'.format(test_loss))
for i in range(10):
if class_total[i] > 0:
print('Test Accuracy of class %d: %.2f%%' % (i, 100 * class_correct[i] / class_total[i]))
else:
print('Test Accuracy of class %d: N/A (no training examples)' % (i))
print('\nTest Accuracy (Overall): %.2f%%' % (100. * np.sum(class_correct) / np.sum(class_total)))
Test Loss: 0.004451
Test Accuracy of class 0: 98.67%
Test Accuracy of class 1: 98.77%
Test Accuracy of class 2: 97.19%
Test Accuracy of class 3: 98.61%
Test Accuracy of class 4: 98.47%
Test Accuracy of class 5: 96.64%
Test Accuracy of class 6: 96.45%
Test Accuracy of class 7: 96.11%
Test Accuracy of class 8: 96.20%
Test Accuracy of class 9: 97.52%
Test Accuracy (Overall): 97.49%
Test model_2
# initialize lists to monitor test loss and accuracy
test_loss = 0.0
class_correct = list(0. for i in range(10))
class_total = list(0. for i in range(10))
model_2.eval() # prep model for *evaluation*
for data, target in test_loader:
y_pre = model_2(data)
l = loss2(y_pre, target).sum()
# implement your code here
test_loss += l.item()# the total loss of this batch
for i in range(batch_size):
if y_pre.argmax(dim=1)[i] == target[i]: # 预测正确
class_correct[target[i]] += 1 # the list of number of correctly classified samples of each class of this batch. label is the index.
class_total[target[i]] +=1 # the list of total number of samples of each class of this batch. label is the index.
# calculate and print avg test loss
test_loss = test_loss / len(test_loader.dataset)
print('Test Loss: {:.6f}\n'.format(test_loss))
for i in range(10):
if class_total[i] > 0:
print('Test Accuracy of class %d: %.2f%%' % (i, 100 * class_correct[i] / class_total[i]))
else:
print('Test Accuracy of class %d: N/A (no training examples)' % (i))
print('\nTest Accuracy (Overall): %.2f%%' % (100. * np.sum(class_correct) / np.sum(class_total)))
Test Loss: 0.003905
Test Accuracy of class 0: 98.98%
Test Accuracy of class 1: 99.47%
Test Accuracy of class 2: 98.16%
Test Accuracy of class 3: 98.42%
Test Accuracy of class 4: 98.17%
Test Accuracy of class 5: 97.98%
Test Accuracy of class 6: 97.60%
Test Accuracy of class 7: 97.86%
Test Accuracy of class 8: 97.02%
Test Accuracy of class 9: 96.93%
Test Accuracy (Overall): 98.08%
Experiments
num | epoch | hidden_neural | lr | TrainLoss | TrainAcc | TestAcc | 备注 |
---|---|---|---|---|---|---|---|
1 | 30 | 20 | 0.2 | 0.016939 | 90.97 | 86.76 | 手动实现的全连接层 |
2 | 30 | 20 | 0.2 | 0.008375 | 95.4 | 93.9 | 手动实现+Relu |
3 | 30 | 20 | 0.2 | 0.004658 | 97.2 | 94.59 | 使用nn.Linear |
4 | 30 | 20 | 0.2 | 0.005426 | 96.72 | 95.54 | 加入BatchNorm层 |
5 | 30 | 20 | 0.2 | 0.005731 | 96.68 | 95.19 | 加入均匀初始化 |
6 | 30 | 20 | 0.2 | 0.033589 | 77.92 | 92.08 | Dropout+BN |
7 | 30 | 20 | 0.2 | 0.034423 | 77.33 | 91.96 | 仅Dropout |
8 | 30 | 20 | 0.1 | 0.003467 | 97.92 | 95.36 | 无Dropout |
9 | 30 | 20 | 0.1 | 0.030022 | 79.61 | 92.57 | 仅Dropout |
10 | 40 | 20 | 0.01 | 0.006123 | 96.46 | 95.53 | 无Dropout |
11 | 40 | 20 | 0.01 | 0.028218 | 81.17 | 93.53 | 有Dropout |
12 | 40 | 50 | 0.01 | 0.002915 | 98.41 | 97.31 | 无Dropout |
13 | 40 | 50 | 0.01 | 0.012396 | 92.38 | 96.16 | 有Dropout |
14 | 40 | 50 | 0.005 | 0.005171 | 97.09 | 96.53 | 无Dropout |
15 | 40 | 50 | 0.005 | 0.014154 | 91.52 | 95.48 | 有Dropout |
16 | 50 | 50 | 0.005 | 0.004808 | 97.62 | 96.97 | 无Dropout |
17 | 50 | 50 | 0.005 | 0.01167 | 92.77 | 96.21 | 有Dropout |
18 | 20 | 1024/512/256/100 | 0.005 | 0.000003 | 100 | 98.55 | 有BN,无Dropout |
19 | 20 | 1024/512/256/100 | 0.005 | 0.005303 | 97.16 | 97.71 | 无BN,有Dropout |
20 | 20 | 1024/512/256/100 | 0.005 | 0.001215 | 99.51 | 97.49 | 仅Relu |
21 | 30 | 1024/512/256/100 | 0.005 | 0.003382 | 98.12 | 98.08 | 有Dropout |
Analyze Your Result (20 marks)
Compare the performance of your models with the following analysis. Both English and Chinese answers are acceptable.
1.Does your vanilla MLP overfit to the training data? (5 marks)
Answer: Yes.
2.If yes, how do you observe it? If no, why? (5 marks)
Answer: The accuracy during training is close to 100%.
But the accuracy on the test set is relatively low.
3.Is regularized model help prevent overfitting? (5 marks)
Answer: When I used Dropout, the training accuracy dropped, and the accuracy of the test improved by 5% compared to the inapplicable Dropout.The regularized model effectively avoids overfitting.
4.Generally compare the performance of two models. (5 marks)
Answer:
The vanilla model converges quickly and the training accuracy is close to 100%.The model using regularization converges relatively slowly, but overfitting is avoided.
In addition, I found that using both Dropout and BatchNormal did not work as well as expected.
参考文献:
https://blog.csdn.net/lxslx/article/details/81746556
https://blog.csdn.net/zjh12312311/article/details/107217024/