Welcome to CS With James
I was very sad that there was no little things to talk about in the VGGnet and the GoogLeNet. Both of them are working great but, GoogLeNet is very complex so it is so hard to code and VGGNet is just upgrade version of the Alexnet. But, now I am so happy that we have DenseNet.
DenseNet not just works better than any other network designs out there and it also very easy to code and compare to other network it have less parameters, so it takes less time to train, which is ultimate form of Neural Network.
Moreover, there is something that I have to fix on my mistake. I said from the previous post that 1k training epoch is not much compare to 300k epoch in the real competition, but I found out that it is not 300k epoch. Actually they are training the network about 300 epoch. I don’t know where is that ‘k’ came from. It was my mistake and I will train the DenseNet only for 300 epochs.
Now, Let me talk about the DenseNet. It was originated from the ResNet which keep the features from the previous layer, so when it is training (send back the error) make it better and high efficiency. However, the ResNet only connected few layer, but DenseNet now connecting “all” the layers.
Once you look at this image it will help you to understand what it means by connecting (concatenate) the layers. Keeping the low level feature will help the network to classify the image. So this is called Dense Block and DenseNet is made of few Dense Blocks with the Tradition Layer
So this is ultimate form of DenseNet. In between the Dense Block there is Transition Layer which will reduce the feature map using the pooling layer. Right before the output layer it is not exactly same to the transition layer but it is doing something similar.
The DenseNet is very easy to code because once you coded the DenseBlock and the Transition Layer, then you can just build it on top of the other to build the whole Network. Here is the code how I did. I made the DenseBlock and the Transition Layer into the Function and once I call the function it add the block on top of existing layer.
There are two parameters you have to know to make the DenseNet better.
- Growth_Rate (K) : I used num_filter in my code for the easy understanding. However in the original paper they used the word Growth Rate and used K to represent it. It is simply how have output filter you want to make after each convolutional layer. In the VGGNet there were 512 filter outputs, but in the DenseNet we can use about 12~24 (In the paper they used 12) for the ‘K’ which will decrease the number of computation dramatically.
- L : I had hard time figure this out, because even in the paper they didn’t explained this parameter very clearly. However, I read the paper so many times and did some research and found out that this is the parameter to represent how many convolutional layers in each Dense Block. It might wrong, if you know this please let me know
They decreased the number of parameters to train by reduce the filters on each layers. Compare to ResNet which have 10M parameters and DenseNet have 1M parameters they have the very similar result. The error rate difference is within the 1% range.
Moreover, most of the layers need to use the bias, which is added after the calculation to decrease the error rate. However, in the DenseNet they replaced the Bias with the BatchNormalization Layer, so they could make the train faster and by using BatchNormalization they could prevent the overfitting. This is really smart and they paid attention to the small details.
So, here is the result. I have trained my DenseNet with CIFAR10 Dataset with L=10, K=20 and Epoch=300.
loss: 0.0024 – acc: 0.9993 – val_loss: 1.0791 – val_acc: 0.8551
loss: 0.0027 – acc: 0.9991 – val_loss: 1.1468 – val_acc: 0.8472
loss: 0.0050 – acc: 0.9983 – val_loss: 1.2931 – val_acc: 0.8339
Test loss: 1.2930997592 Test accuracy: 0.8339
So, it ended up with the accuracy of 83.39% and it is way better than the previous Networks.