Dropout in Deep Learning

Jan 2, 2024·
Junhong Liu
Junhong Liu
· 4 min read

1. Introduction

1.1. Why we need Dropout

In machine learning models, if the model has too many parameters but few training samples, the trained model is prone to overfitting. The problem of overfitting is often encountered when training neural networks, which is specifically manifested in the following: the model has a smaller loss function and higher prediction accuracy on the training data, but the results on the test data are not ideal.

Dropout can be effective in mitigating the occurrence of overfitting.

1.2. What is Dropout

The Dropout was proposed by Hinton et al. (2012) in 《Improving neural networks by preventing co-adaptation of feature detectors》, which prevent overfitting by suppressing the co-action of feature detectors (hidden layer nodes). The term “dropout” refers to dropping out units (hidden and visible) in a neural network.

dropout1

In 《Dropout: A Simple Way to Prevent Neural Networks from Overfitting》, Srivastava et al. (2014) was inspired by a theory of the role of sex in evolution (Livnat et al., 2010), they raised that “the role of sexual reproduction is not just to allow useful new genes to spread throughout the population, but also to facilitate this process by reducing complex co-adaptations that would reduce the chance of a new gene improving the fitness of an individual. Similarly, each hidden unit in a neural network trained with dropout must learn to work with a randomly chosen sample of other units.”

2. Model Description

dropout2

Consider a neural network with $L$ hidden layers. Let $l \in {1, . . . , L}$ index the hidden layers of the network. Let $\vec{z}^{(l)}$ denote the vector of inputs into layer $l$, $\vec{y}^{(l)}$ denote the vector of outputs from layer $l$. $\vec{w}^{(l)}$ and $\vec{b}^{(l)}$ are the weights and biase at layer $l$.


The feed-forward operation of a standard neural network can be described as (for hidden unit $i$):

$$ \begin{aligned} z^l_i &= \sum_j{w^l_{ij} \cdot y^{(l-1)}_j} + b^l_i \\ y^l &= f(z^l_i) \end{aligned} $$

where $f$ is an activation function, $y^{(0)} = x$ is the original input.


With dropout, the vector of outputs from layer $l$ becomes (Normally, do not use dropout on original input):

$$ \begin{aligned} r^l_j &\sim \mathsf{Bernoulli}(p) \\ \tilde{y}^l_j &= r^l_j \cdot y^l_j \end{aligned} $$

Thus, the feed-forward operation becomes:

$$ \begin{aligned} z^l_i &= \sum_j{w^l_{ij} \cdot \tilde{y}^{(l-1)}_j} + b^l_i \\ y^l &= f(z^l_i) \end{aligned} $$

where $f$ is an activation function, $\tilde{y}^{(0)} = x$ is the original input. For any layer $l$, $r^l_j$ is an independent Bernoulli random variable, each of which has probability $p$ of being 1.

The dropout amounts to sampling a sub-network from a larger network. For learning, the derivatives of the loss function are backpropagated through the sub-network. At test time, the weights are scaled as $\vec{w}^l_{test} = p \cdot \vec{w}^l$. The resulting neural network is used without dropout.

dropout3

However, it is important to note that the weights should be multiplied by $p$ when we use the resulting neural network, so that the outputs at training and test time will have the same mathematical expectation.

2.1. Code

# coding:utf-8
import numpy as np

# Implementation of the dropout function
def dropout(x, level):
    if level < 0. or level >= 1: # level is a probability value of dropping out and must be between 0 and 1.
        raise ValueError('Dropout level must be in interval [0, 1]')
    retain_prob = 1. - level

    # We generate the same dimension vector as x by means of the Binomial function.
    # The vector with a distribution of 0 and 1. 0 indicate that corresponding neuron is blocked and not working.
    random_tensor = np.random.binomial(n=1, p=retain_prob, size=x.shape)
    print("random_tensor:")
    print(random_tensor)

    x = x * random_tensor
    print("x:")
    print(x)

    return x

# run this code 
x=np.asarray([1,2,3,4,5,6,7,8,9,10],dtype=np.float32)
dropout(x,0.4)
Reference

[1] N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov, ‘Dropout: A Simple Way to Prevent Neural Networks from Overfitting’, Journal of Machine Learning Research, vol. 15, no. 56, pp. 1929–1958, 2014.

[2] https://zhuanlan.zhihu.com/p/38200980