Assignment 1: Multi-Layer Perceptron with MNIST Dataset

In this assignment, you are required to train two MLPs to classify images from the MNIST database hand-written digit database by using PyTorch.

The process will be broken down into the following steps:

Load and visualize the data.
Define a neural network. (30 marks)
Train the models. (30 marks)
Evaluate the performance of our trained models on the test dataset. (20 marks)
Analysis your results. (20 marks)

import torch
import numpy as np

import os
os.environ['KMP_DUPLICATE_LIB_OK']='True'  # 解决macos下重复初始化libiomp5.dylib问题

Load and Visualize the Data

Downloading may take a few moments, and you should see your progress as the data is loading. You may also choose to change the batch_size if you want to load more data at a time.

This cell will create DataLoaders for each of our datasets.

from torchvision import datasets
import torchvision.transforms as transforms

# number of subprocesses to use for data loading
num_workers = 0
# how many samples per batch to load
batch_size = 20

# convert data to torch.FloatTensor
transform = transforms.ToTensor()

# choose the training and test datasets
train_data = datasets.MNIST(root='data', train=True,
                                   download=True, transform=transform)
test_data = datasets.MNIST(root='data', train=False,
                                  download=True, transform=transform)

# prepare data loaders
train_loader = torch.utils.data.DataLoader(train_data, batch_size=batch_size, num_workers=num_workers)
test_loader = torch.utils.data.DataLoader(test_data, batch_size=batch_size, num_workers=num_workers)

Visualize a Batch of Training Data

The first step in a classification task is to take a look at the data, make sure it is loaded in correctly, then make any initial observations about patterns in that data.

import matplotlib.pyplot as plt
%matplotlib inline
    
# obtain one batch of training images
dataiter = iter(train_loader)
images, labels = dataiter.next()
images = images.numpy()

# plot the images in the batch, along with the corresponding labels
fig = plt.figure(figsize=(25, 4))
for idx in np.arange(20):
    ax = fig.add_subplot(2, 20/2, idx+1, xticks=[], yticks=[])
    ax.imshow(np.squeeze(images[idx]), cmap='gray')  # np.squeeze():从数组的形状中删除单维度条目，把shape中为1的维度去掉
    # print out the correct label for each image
    # .item() gets the value contained in a Tensor
    ax.set_title(str(labels[idx].item()))

png

View an Image in More Detail

img = np.squeeze(images[1])

fig = plt.figure(figsize = (12,12)) 
ax = fig.add_subplot(111)
ax.imshow(img, cmap='gray')
width, height = img.shape
thresh = img.max()/2.5
for x in range(width):
    for y in range(height):
        val = round(img[x][y],2) if img[x][y] !=0 else 0  # round：四舍五入2位
        ax.annotate(str(val), xy=(y,x),
                    horizontalalignment='center',
                    verticalalignment='center',
                    color='white' if img[x][y]<thresh else 'black')

png

Define the Network Architecture (30 marks)

Input: a 784-dim Tensor of pixel values for each image.
Output: a 10-dim Tensor of number of classes that indicates the class scores for an input image.

You need to implement three models:

a vanilla multi-layer perceptron. (10 marks)
a multi-layer perceptron with regularization (dropout or L2 or both). (10 marks)
the corresponding loss functions and optimizers. (10 marks)

Build model_1

from torch import nn
from torch import optim
import torch
import numpy as np
from torch.nn import init
torch.manual_seed(1) # 设置随机数种子

## Define the MLP architecture
class VanillaMLP(nn.Module):
    def __init__(self):
        super(VanillaMLP, self).__init__()
        
# 手动实现全连接层
#         self.w1 = nn.Parameter(torch.randn(784,hidden_features))
#         self.b1 = nn.Parameter(torch.randn(hidden_features))
#         self.w2 = nn.Parameter(torch.randn(hidden_features,10))
#         self.b2 = nn.Parameter(torch.randn(10))
#         self.relu = nn.ReLU() ,nn.BatchNorm1d(1024)
        
        self.layer1 = nn.Sequential(nn.Linear(784, 1024),nn.ReLU())
        self.layer2 = nn.Sequential(nn.Linear(1024, 512),nn.ReLU())
        self.layer3 = nn.Sequential(nn.Linear(512, 256),nn.ReLU())
        self.layer4 = nn.Sequential(nn.Linear(256, 100),nn.ReLU())
        self.layer5 = nn.Sequential(nn.Linear(100, 10))
#         init.xavier_normal_(self.layer1[0].weight)  # 初始化权重为正态分布（nn.Linear默认初始化为均匀分布）
#         init.xavier_normal_(self.layer2[0].weight)

    def forward(self, x):
        # flatten image input
        x = x.view(-1, 28 * 28)

# 调用手动实现的全连接层
#         x = x.mm(self.w1)
#         h = x + self.b1.expand_as(x)
#         h = self.relu(h)  # 应用激活函数
#         h = h.mm(self.w2)
#         x = h + self.b2.expand_as(h)
        x = self.layer1(x)
        x = self.layer2(x)
        x = self.layer3(x)
        x = self.layer4(x)
        x = self.layer5(x)
        
        return x

# initialize the MLP
model_1 = VanillaMLP() 

# specify loss function
# implement your codes here
loss1 = nn.CrossEntropyLoss()

# specify your optimizer
# implement your codes here
optimizer = optim.SGD(params=model_1.parameters(), lr=0.005)
optimizer.zero_grad()  # 梯度清零
optimizer

SGD (
Parameter Group 0
    dampening: 0
    lr: 0.005
    momentum: 0
    nesterov: False
    weight_decay: 0
)

Build model_2

## Define the MLP architecture
class RegularizedMLP(nn.Module):
    def __init__(self):
        super(RegularizedMLP, self).__init__()
        
        # implement your codes here
        #self.layer1 = nn.Sequential(nn.Linear(784, hidden_features),nn.ReLU(),nn.Dropout(0.5))
        #self.layer2 = nn.Sequential(nn.Linear(hidden_features, 10))
        self.layer1 = nn.Sequential(nn.Linear(784, 1024),nn.ReLU(),nn.Dropout(0.5))
        self.layer2 = nn.Sequential(nn.Linear(1024, 512),nn.ReLU(),nn.Dropout(0.5))
        self.layer3 = nn.Sequential(nn.Linear(512, 256),nn.ReLU(),nn.Dropout(0.5))
        self.layer4 = nn.Sequential(nn.Linear(256, 100),nn.ReLU(),nn.Dropout(0.5))
        self.layer5 = nn.Sequential(nn.Linear(100, 10))
        
    def forward(self, x):
        # flatten image input
        x = x.view(-1, 28 * 28)

        # implement your codes here
        x = self.layer1(x)
        x = self.layer2(x)
        x = self.layer3(x)
        x = self.layer4(x)
        x = self.layer5(x)
        
        return x

# initialize the MLP
model_2 = RegularizedMLP()

# specify loss function
# implement your codes here
loss2 = nn.CrossEntropyLoss()
# specify your optimizer
# implement your codes here
optimizer = optim.SGD(params=model_2.parameters(), lr=0.005)
optimizer.zero_grad()  # 梯度清零
optimizer

SGD (
Parameter Group 0
    dampening: 0
    lr: 0.005
    momentum: 0
    nesterov: False
    weight_decay: 0
)

Train the Network (30 marks)

Train your models in the following two cells.

The following loop trains for 30 epochs; feel free to change this number. For now, we suggest somewhere between 20-50 epochs. As you train, take a look at how the values for the training loss decrease over time. We want it to decrease while also avoiding overfitting the training data.

The key parts in the training process are left for you to implement.

Train model_1

# number of epochs to train the model
n_epochs = 20  # suggest training between 20-50 epochs

model_1.train() # prep model for training 将当前module及其子module中的所有training属性都设为True

for epoch in range(n_epochs):
    # monitor training loss
    train_loss = 0.0
    total_correct = 0
    
    for data, target in train_loader:
        y_pre = model_1(data)
        l = loss1(y_pre, target).sum()
        optimizer.zero_grad()
        l.backward()
        optimizer.step()  # 执行优化
        
        # implement your code here
        train_loss += l.item()# the total loss of this batch
        total_correct += (y_pre.argmax(dim=1) == target).sum().item() # the accumulated number of correctly classified samples of this batch
        
    # print training statistics 
    # calculate average loss and accuracy over an epoch
    train_loss = train_loss / len(train_loader.dataset)
    train_acc = 100. * total_correct / len(train_loader.dataset)
    
    print('Epoch: {} \tTraining Loss: {:.6f} \tTraining Acc: {:.2f}%%'.format(
        epoch+1, 
        train_loss,
        train_acc
        ))

Epoch: 1     Training Loss: 0.113190     Training Acc: 22.09%%
Epoch: 2     Training Loss: 0.046626     Training Acc: 74.66%%
Epoch: 3     Training Loss: 0.019871     Training Acc: 88.53%%
Epoch: 4     Training Loss: 0.014905     Training Acc: 91.53%%
Epoch: 5     Training Loss: 0.011858     Training Acc: 93.19%%
Epoch: 6     Training Loss: 0.009501     Training Acc: 94.55%%
Epoch: 7     Training Loss: 0.007791     Training Acc: 95.55%%
Epoch: 8     Training Loss: 0.006547     Training Acc: 96.28%%
Epoch: 9     Training Loss: 0.005593     Training Acc: 96.83%%
Epoch: 10     Training Loss: 0.004822     Training Acc: 97.28%%
Epoch: 11     Training Loss: 0.004186     Training Acc: 97.69%%
Epoch: 12     Training Loss: 0.003650     Training Acc: 98.00%%
Epoch: 13     Training Loss: 0.003196     Training Acc: 98.28%%
Epoch: 14     Training Loss: 0.002803     Training Acc: 98.48%%
Epoch: 15     Training Loss: 0.002459     Training Acc: 98.71%%
Epoch: 16     Training Loss: 0.002155     Training Acc: 98.91%%
Epoch: 17     Training Loss: 0.001881     Training Acc: 99.08%%
Epoch: 18     Training Loss: 0.001634     Training Acc: 99.24%%
Epoch: 19     Training Loss: 0.001413     Training Acc: 99.38%%
Epoch: 20     Training Loss: 0.001215     Training Acc: 99.51%%

Train model_2

# number of epochs to train the model
n_epochs = 30  # suggest training between 20-50 epochs

model_2.train() # prep model for training

for epoch in range(n_epochs):
    # monitor training loss
    train_loss = 0.0
    total_correct = 0
    
    for data, target in train_loader:
        y_pre = model_2(data)
        l = loss2(y_pre, target).sum()
        optimizer.zero_grad()
        l.backward()
        optimizer.step()  # 执行优化
        
        # implement your code here
        train_loss += l.item()# the total loss of this batch
        total_correct += (y_pre.argmax(dim=1) == target).sum().item() # the accumulated number of correctly classified samples of this batch
        
    # print training statistics 
    # calculate average loss and accuracy over an epoch
    train_loss = train_loss / len(train_loader.dataset)
    train_acc = 100. * total_correct / len(train_loader.dataset)
    
    print('Epoch: {} \tTraining Loss: {:.6f} \tTraining Acc: {:.2f}%%'.format(
        epoch+1, 
        train_loss,
        train_acc
        ))

Epoch: 1     Training Loss: 0.114623     Training Acc: 14.26%%
Epoch: 2     Training Loss: 0.089865     Training Acc: 37.94%%
Epoch: 3     Training Loss: 0.047813     Training Acc: 66.14%%
Epoch: 4     Training Loss: 0.033164     Training Acc: 79.35%%
Epoch: 5     Training Loss: 0.024138     Training Acc: 86.14%%
Epoch: 6     Training Loss: 0.018936     Training Acc: 89.40%%
Epoch: 7     Training Loss: 0.015476     Training Acc: 91.53%%
Epoch: 8     Training Loss: 0.013406     Training Acc: 92.89%%
Epoch: 9     Training Loss: 0.011899     Training Acc: 93.57%%
Epoch: 10     Training Loss: 0.010539     Training Acc: 94.33%%
Epoch: 11     Training Loss: 0.009646     Training Acc: 94.91%%
Epoch: 12     Training Loss: 0.008601     Training Acc: 95.45%%
Epoch: 13     Training Loss: 0.008043     Training Acc: 95.69%%
Epoch: 14     Training Loss: 0.007412     Training Acc: 96.11%%
Epoch: 15     Training Loss: 0.006955     Training Acc: 96.30%%
Epoch: 16     Training Loss: 0.006423     Training Acc: 96.61%%
Epoch: 17     Training Loss: 0.006151     Training Acc: 96.65%%
Epoch: 18     Training Loss: 0.005697     Training Acc: 96.97%%
Epoch: 19     Training Loss: 0.005397     Training Acc: 97.13%%
Epoch: 20     Training Loss: 0.005209     Training Acc: 97.30%%
Epoch: 21     Training Loss: 0.004915     Training Acc: 97.39%%
Epoch: 22     Training Loss: 0.004662     Training Acc: 97.48%%
Epoch: 23     Training Loss: 0.004268     Training Acc: 97.67%%
Epoch: 24     Training Loss: 0.004211     Training Acc: 97.69%%
Epoch: 25     Training Loss: 0.004022     Training Acc: 97.78%%
Epoch: 26     Training Loss: 0.003985     Training Acc: 97.87%%
Epoch: 27     Training Loss: 0.003757     Training Acc: 97.95%%
Epoch: 28     Training Loss: 0.003597     Training Acc: 98.03%%
Epoch: 29     Training Loss: 0.003284     Training Acc: 98.17%%
Epoch: 30     Training Loss: 0.003382     Training Acc: 98.12%%

Test the Trained Network (20 marks)

Test the performance of trained models on test data. Except the total test accuracy, you should calculate the accuracy for each class.

Test model_1

# initialize lists to monitor test loss and accuracy
test_loss = 0.0
class_correct = list(0. for i in range(10))
class_total = list(0. for i in range(10))

model_1.eval() # prep model for *evaluation*  #把training属性都设为False,固定BN和dropout层

for data, target in test_loader:
    y_pre = model_1(data)
    l = loss1(y_pre, target).sum()
    # implement your code here
    
    test_loss += l.item()# the total loss of this batch
    
    for i in range(batch_size):
        if y_pre.argmax(dim=1)[i] == target[i]: # 预测正确
            class_correct[target[i]] += 1  # the list of number of correctly classified samples of each class of this batch. label is the index.
        class_total[target[i]] +=1  # the list of total number of samples of each class of this batch. label is the index.
        

# calculate and print avg test loss
test_loss = test_loss / len(test_loader.dataset)
print('Test Loss: {:.6f}\n'.format(test_loss))

for i in range(10):
    if class_total[i] > 0:
        print('Test Accuracy of class %d: %.2f%%' % (i, 100 * class_correct[i] / class_total[i]))
    else:
        print('Test Accuracy of class %d: N/A (no training examples)' % (i))

print('\nTest Accuracy (Overall): %.2f%%' % (100. * np.sum(class_correct) / np.sum(class_total)))

Test Loss: 0.004451

Test Accuracy of class 0: 98.67%
Test Accuracy of class 1: 98.77%
Test Accuracy of class 2: 97.19%
Test Accuracy of class 3: 98.61%
Test Accuracy of class 4: 98.47%
Test Accuracy of class 5: 96.64%
Test Accuracy of class 6: 96.45%
Test Accuracy of class 7: 96.11%
Test Accuracy of class 8: 96.20%
Test Accuracy of class 9: 97.52%

Test Accuracy (Overall): 97.49%

Test model_2

# initialize lists to monitor test loss and accuracy
test_loss = 0.0
class_correct = list(0. for i in range(10))
class_total = list(0. for i in range(10))

model_2.eval() # prep model for *evaluation*

for data, target in test_loader:
    y_pre = model_2(data)
    l = loss2(y_pre, target).sum()
    # implement your code here
    
    test_loss += l.item()# the total loss of this batch
    
    for i in range(batch_size):
        if y_pre.argmax(dim=1)[i] == target[i]: # 预测正确
            class_correct[target[i]] += 1  # the list of number of correctly classified samples of each class of this batch. label is the index.
        class_total[target[i]] +=1  # the list of total number of samples of each class of this batch. label is the index.


# calculate and print avg test loss
test_loss = test_loss / len(test_loader.dataset)
print('Test Loss: {:.6f}\n'.format(test_loss))

for i in range(10):
    if class_total[i] > 0:
        print('Test Accuracy of class %d: %.2f%%' % (i, 100 * class_correct[i] / class_total[i]))
    else:
        print('Test Accuracy of class %d: N/A (no training examples)' % (i))

print('\nTest Accuracy (Overall): %.2f%%' % (100. * np.sum(class_correct) / np.sum(class_total)))

Test Loss: 0.003905

Test Accuracy of class 0: 98.98%
Test Accuracy of class 1: 99.47%
Test Accuracy of class 2: 98.16%
Test Accuracy of class 3: 98.42%
Test Accuracy of class 4: 98.17%
Test Accuracy of class 5: 97.98%
Test Accuracy of class 6: 97.60%
Test Accuracy of class 7: 97.86%
Test Accuracy of class 8: 97.02%
Test Accuracy of class 9: 96.93%

Test Accuracy (Overall): 98.08%

Experiments

num	epoch	hidden_neural	lr	TrainLoss	TrainAcc	TestAcc	备注
1	30	20	0.2	0.016939	90.97	86.76	手动实现的全连接层
2	30	20	0.2	0.008375	95.4	93.9	手动实现+Relu
3	30	20	0.2	0.004658	97.2	94.59	使用nn.Linear
4	30	20	0.2	0.005426	96.72	95.54	加入BatchNorm层
5	30	20	0.2	0.005731	96.68	95.19	加入均匀初始化
6	30	20	0.2	0.033589	77.92	92.08	Dropout+BN
7	30	20	0.2	0.034423	77.33	91.96	仅Dropout
8	30	20	0.1	0.003467	97.92	95.36	无Dropout
9	30	20	0.1	0.030022	79.61	92.57	仅Dropout
10	40	20	0.01	0.006123	96.46	95.53	无Dropout
11	40	20	0.01	0.028218	81.17	93.53	有Dropout
12	40	50	0.01	0.002915	98.41	97.31	无Dropout
13	40	50	0.01	0.012396	92.38	96.16	有Dropout
14	40	50	0.005	0.005171	97.09	96.53	无Dropout
15	40	50	0.005	0.014154	91.52	95.48	有Dropout
16	50	50	0.005	0.004808	97.62	96.97	无Dropout
17	50	50	0.005	0.01167	92.77	96.21	有Dropout
18	20	1024/512/256/100	0.005	0.000003	100	98.55	有BN，无Dropout
19	20	1024/512/256/100	0.005	0.005303	97.16	97.71	无BN，有Dropout
20	20	1024/512/256/100	0.005	0.001215	99.51	97.49	仅Relu
21	30	1024/512/256/100	0.005	0.003382	98.12	98.08	有Dropout

Analyze Your Result (20 marks)

Compare the performance of your models with the following analysis. Both English and Chinese answers are acceptable.

1.Does your vanilla MLP overfit to the training data? (5 marks)

Answer: Yes.

2.If yes, how do you observe it? If no, why? (5 marks)

Answer: The accuracy during training is close to 100%.
But the accuracy on the test set is relatively low.

3.Is regularized model help prevent overfitting? (5 marks)

Answer: When I used Dropout, the training accuracy dropped, and the accuracy of the test improved by 5% compared to the inapplicable Dropout.The regularized model effectively avoids overfitting.

4.Generally compare the performance of two models. (5 marks)

Answer:
The vanilla model converges quickly and the training accuracy is close to 100%.The model using regularization converges relatively slowly, but overfitting is avoided.
In addition, I found that using both Dropout and BatchNormal did not work as well as expected.

参考文献：

https://blog.csdn.net/lxslx/article/details/81746556

https://blog.csdn.net/zjh12312311/article/details/107217024/

https://www.zhihu.com/question/68433311