dropout_layer

Dropout Layer๋ž€?

Dropout Layer๋Š” ๋”ฅ๋Ÿฌ๋‹์—์„œ ์˜ค๋ฒ„ํ”ผํŒ…์„ ๋ฐฉ์ง€ํ•˜๊ธฐ ์œ„ํ•œ regularization ๊ธฐ๋ฒ• ์ค‘ ํ•˜๋‚˜์ž…๋‹ˆ๋‹ค. ์ด ๋ ˆ์ด์–ด๋Š” ํ•™์Šต ์ค‘์— ์ผ๋ถ€ ๋‰ด๋Ÿฐ์„ ๋ฌด์ž‘์œ„๋กœ ์„ ํƒํ•˜์—ฌ ์ถœ๋ ฅ์„ 0์œผ๋กœ ๋งŒ๋“œ๋Š” ๊ณผ์ •์„ ์ˆ˜ํ–‰ํ•ฉ๋‹ˆ๋‹ค. ์ด๋ ‡๊ฒŒ ํ•จ์œผ๋กœ์จ ๋„คํŠธ์›Œํฌ๊ฐ€ ํŠน์ • ๋‰ด๋Ÿฐ์— ๊ณผ๋„ํ•˜๊ฒŒ ์˜์กดํ•˜์ง€ ์•Š๋„๋ก ํ•˜๊ณ , ๋‰ด๋Ÿฐ์˜ ๊ฐ€์ค‘์น˜๊ฐ€ ์ „์ฒด ๋ฐ์ดํ„ฐ์…‹์— ๋Œ€ํ•ด ์ผ๋ฐ˜ํ™”๋˜๋„๋ก ํ•ฉ๋‹ˆ๋‹ค.

Dropout Layer๋Š” ์ผ๋ฐ˜์ ์œผ๋กœ fully connected layer๋‚˜ convolutional layer ๋‹ค์Œ์— ์ถ”๊ฐ€๋ฉ๋‹ˆ๋‹ค. ํ•™์Šต ์ค‘์— Dropout Layer๋ฅผ ํ†ต๊ณผํ•œ ์ถœ๋ ฅ๊ฐ’์€ ์‹ค์ œ๋กœ ํ•™์Šต์— ์‚ฌ์šฉ๋˜์ง€ ์•Š์Šต๋‹ˆ๋‹ค. ๋Œ€์‹ , ํ•™์Šต์ด ์™„๋ฃŒ๋œ ํ›„์—๋Š” ๋ชจ๋“  ๋‰ด๋Ÿฐ์„ ์‚ฌ์šฉํ•˜์—ฌ ์ถœ๋ ฅ์„ ๊ณ„์‚ฐํ•ฉ๋‹ˆ๋‹ค. ์ด๋Š” ์ผ์ข…์˜ ์•™์ƒ๋ธ” ํ•™์Šต๊ณผ ์œ ์‚ฌํ•œ ํšจ๊ณผ๋ฅผ ๊ฐ€์ง€๋ฉฐ, ์˜ค๋ฒ„ํ”ผํŒ…์„ ์ค„์ด๊ณ  ์ผ๋ฐ˜ํ™” ์„ฑ๋Šฅ์„ ํ–ฅ์ƒ์‹œํ‚ต๋‹ˆ๋‹ค.

Dropout Layer์˜ ์‚ฌ์šฉ ์—ฌ๋ถ€์™€ dropout ๋น„์œจ์€ ํ•˜์ดํผํŒŒ๋ผ๋ฏธํ„ฐ๋กœ์จ ์กฐ์ ˆ๋ฉ๋‹ˆ๋‹ค. ์ด ๊ฐ’์€ ์‹ ๊ฒฝ๋ง์˜ ๋ณต์žก์„ฑ๊ณผ ๋ฐ์ดํ„ฐ์…‹์˜ ํฌ๊ธฐ์— ๋”ฐ๋ผ ์กฐ์ •๋  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ๋˜ํ•œ, Dropout Layer์˜ ์‚ฌ์šฉ ์—ฌ๋ถ€์™€ ๋น„์œจ์€ ๋„คํŠธ์›Œํฌ์˜ ์ผ๋ฐ˜ํ™” ์„ฑ๋Šฅ๊ณผ ํ•™์Šต ์†๋„์— ํฐ ์˜ํ–ฅ์„ ๋ฏธ์นฉ๋‹ˆ๋‹ค.

forward_dropout_layer

void forward_dropout_layer(dropout_layer l, network net)
{
    int i;
    if (!net.train) return;
    for(i = 0; i < l.batch * l.inputs; ++i){
        float r = rand_uniform(0, 1);
        l.rand[i] = r;
        if(r < l.probability) net.input[i] = 0;
        else net.input[i] *= l.scale;
    }
}

ํ•จ์ˆ˜ ์ด๋ฆ„: forward_dropout_layer

์ž…๋ ฅ:

  • dropout_layer l: dropout layer์˜ ๊ตฌ์กฐ์ฒด

  • network net: neural network์˜ ๊ตฌ์กฐ์ฒด

๋™์ž‘:

  • neural network์—์„œ dropout layer๋ฅผ ์ˆ˜ํ–‰ํ•˜๋Š” forward propagation ํ•จ์ˆ˜

  • dropout layer๋Š” ์ž…๋ ฅ๊ฐ’์˜ ์ผ๋ถ€๋ฅผ 0์œผ๋กœ ๋งŒ๋“ค์–ด์ฃผ๋Š” ์—ญํ• ์„ ํ•œ๋‹ค.

  • ๋งŒ์•ฝ ํ˜„์žฌ๊ฐ€ ํ•™์Šต ๋ชจ๋“œ์ธ ๊ฒฝ์šฐ, ๊ฐ ์ž…๋ ฅ๊ฐ’์— ๋Œ€ํ•ด ํ™•๋ฅ  p(์ฃผ์–ด์ง„ ํ™•๋ฅ ๊ฐ’)๋ณด๋‹ค ์ž‘์€ ๊ฐ’์ธ ๊ฒฝ์šฐ ํ•ด๋‹น ์ž…๋ ฅ๊ฐ’์„ 0์œผ๋กœ ์„ค์ •ํ•œ๋‹ค. ๊ทธ๋ ‡์ง€ ์•Š์€ ๊ฒฝ์šฐ ํ•ด๋‹น ์ž…๋ ฅ๊ฐ’์„ (1-p)๋ฐฐ ํ•ด์ค€๋‹ค.

  • ์ด๋•Œ, dropout layer๋Š” ์ž…๋ ฅ๊ฐ’์ด 0์œผ๋กœ ๋ฐ”๋€ ๋น„์œจ๋งŒํผ์˜ scale factor๋ฅผ ์œ ์ง€ํ•œ๋‹ค. (์ดํ›„ backpropagation ์‹œ ํ™œ์šฉ)

์„ค๋ช…:

  • dropout layer๋Š” overfitting์„ ๋ฐฉ์ง€ํ•˜๊ธฐ ์œ„ํ•œ regularization ๋ฐฉ๋ฒ• ์ค‘ ํ•˜๋‚˜๋กœ, ํŠนํžˆ deep neural network์—์„œ ํšจ๊ณผ์ ์ด๋‹ค.

  • ํ•™์Šต ์‹œ, dropout layer๋Š” ์ž…๋ ฅ๊ฐ’ ์ค‘ ์ผ๋ถ€๋ฅผ ๋ฌด์ž‘์œ„๋กœ ์„ ํƒํ•˜์—ฌ 0์œผ๋กœ ๋งŒ๋“ค์–ด์ค€๋‹ค. ์ด๋Š” ๋ชจ๋ธ์ด ํŠน์ • feature์— ๊ณผ๋„ํ•˜๊ฒŒ ์˜์กดํ•˜์ง€ ์•Š๋„๋ก ํ•˜์—ฌ, generalization ๋Šฅ๋ ฅ์„ ํ–ฅ์ƒ์‹œ์ผœ์ค€๋‹ค.

  • ํ…Œ์ŠคํŠธ ์‹œ, dropout layer๋Š” ์‚ฌ์šฉ๋˜์ง€ ์•Š๋Š”๋‹ค. ๋Œ€์‹  ํ•™์Šต ์‹œ ์‚ฌ์šฉ๋œ ํ™•๋ฅ  p๋ฅผ ์ด์šฉํ•˜์—ฌ ์ž…๋ ฅ๊ฐ’์— (1-p)๋ฅผ ๊ณฑํ•ด์คŒ์œผ๋กœ์จ, ํ•™์Šต ์‹œ dropout์ด ์ ์šฉ๋œ ๋ชจ๋ธ์˜ ์˜ˆ์ธก ๊ฒฐ๊ณผ๋ฅผ ๋ณด์ •ํ•ด์ค€๋‹ค.

  • dropout layer๋Š” fully connected layer์™€ convolutional layer ๋ชจ๋‘์—์„œ ์‚ฌ์šฉ๋  ์ˆ˜ ์žˆ๋‹ค.

backward_dropout_layer

void backward_dropout_layer(dropout_layer l, network net)
{
    int i;
    if(!net.delta) return;
    for(i = 0; i < l.batch * l.inputs; ++i){
        float r = l.rand[i];
        if(r < l.probability) net.delta[i] = 0;
        else net.delta[i] *= l.scale;
    }
}

ํ•จ์ˆ˜ ์ด๋ฆ„: backward_dropout_layer

์ž…๋ ฅ:

  • dropout_layer l: ๋“œ๋กญ์•„์›ƒ ๋ ˆ์ด์–ด ๊ตฌ์กฐ์ฒด

  • network net: ์‹ ๊ฒฝ๋ง ๊ตฌ์กฐ์ฒด

๋™์ž‘:

  • ๋“œ๋กญ์•„์›ƒ ๋ ˆ์ด์–ด์˜ ์—ญ์ „ํŒŒ(forward pass)๋ฅผ ์ˆ˜ํ–‰ํ•œ๋‹ค.

  • ์—ญ์ „ํŒŒ ์‹œ, ๋žœ๋คํ•˜๊ฒŒ ์„ ํƒ๋œ ์ž…๋ ฅ ๊ฐ’์— ๋Œ€ํ•ด์„œ๋งŒ ๊ทธ๋ž˜๋””์–ธํŠธ(gradient)๋ฅผ ๊ณ„์‚ฐํ•˜์—ฌ ์ถœ๋ ฅ๊ฐ’์„ ๊ฐฑ์‹ ํ•œ๋‹ค.

์„ค๋ช…:

  • ๋„คํŠธ์›Œํฌ๊ฐ€ ํ•™์Šต ์ƒํƒœ์ธ ๊ฒฝ์šฐ์—๋งŒ ๋“œ๋กญ์•„์›ƒ ๋ ˆ์ด์–ด์˜ ์—ญ์ „ํŒŒ๋ฅผ ์ˆ˜ํ–‰ํ•œ๋‹ค.

  • ๋žœ๋คํ•˜๊ฒŒ ์„ ํƒ๋œ ์ž…๋ ฅ ๊ฐ’์— ๋Œ€ํ•œ ๊ทธ๋ž˜๋””์–ธํŠธ๋Š” ๊ณ„์‚ฐํ•˜์ง€ ์•Š๊ณ  0์œผ๋กœ ์„ค์ •ํ•˜์—ฌ ์ถœ๋ ฅ๊ฐ’์„ ๊ฐฑ์‹ ํ•œ๋‹ค.

  • ๊ทธ ์™ธ์˜ ์ž…๋ ฅ ๊ฐ’์— ๋Œ€ํ•ด์„œ๋Š” scale ๊ฐ’์— ๋”ฐ๋ผ ๊ทธ๋ž˜๋””์–ธํŠธ๋ฅผ ๊ณ„์‚ฐํ•˜์—ฌ ์ถœ๋ ฅ๊ฐ’์„ ๊ฐฑ์‹ ํ•œ๋‹ค.

resize_dropout_layer

void resize_dropout_layer(dropout_layer *l, int inputs)
{
    l->rand = realloc(l->rand, l->inputs*l->batch*sizeof(float));
    #ifdef GPU
    cuda_free(l->rand_gpu);

    l->rand_gpu = cuda_make_array(l->rand, inputs*l->batch);
    #endif
}

ํ•จ์ˆ˜ ์ด๋ฆ„: resize_dropout_layer

์ž…๋ ฅ:

  • dropout_layer ๊ตฌ์กฐ์ฒด ํฌ์ธํ„ฐ l

  • int inputs

๋™์ž‘:

  • dropout ๋ ˆ์ด์–ด์˜ ๋žœ๋ค ๋“œ๋กญ์•„์›ƒ ๋งˆ์Šคํฌ๋ฅผ ์ž…๋ ฅ ์ˆ˜์— ๋งž๊ฒŒ ์กฐ์ ˆํ•œ๋‹ค.

  • ์ž…๋ ฅ ์ˆ˜๊ฐ€ ์ด์ „์— ์„ค์ •๋œ ์ž…๋ ฅ ์ˆ˜๋ณด๋‹ค ์ž‘์€ ๊ฒฝ์šฐ, ๋งˆ์Šคํฌ๋ฅผ ์ƒˆ๋กœ์šด ํฌ๊ธฐ์— ๋งž๊ฒŒ ์กฐ์ •ํ•œ๋‹ค.

  • GPU ๋ฒ„์ „์˜ ๊ฒฝ์šฐ CUDA ๋ฉ”๋ชจ๋ฆฌ๋ฅผ ๋‹ค์‹œ ํ• ๋‹นํ•˜๊ณ , ์—…๋ฐ์ดํŠธ๋œ ๋žœ๋ค ๋“œ๋กญ์•„์›ƒ ๋งˆ์Šคํฌ๋ฅผ ๋ณต์‚ฌํ•œ๋‹ค.

์„ค๋ช…:

  • ์ž…๋ ฅ์œผ๋กœ ๋ฐ›์€ dropout_layer ๊ตฌ์กฐ์ฒด ํฌ์ธํ„ฐ l์˜ rand ๋ฐฐ์—ด์˜ ํฌ๊ธฐ๋ฅผ inputsl->batchsizeof(float)๋กœ ์žฌํ• ๋‹นํ•œ๋‹ค.

  • GPU๊ฐ€ ํ™œ์„ฑํ™”๋˜์–ด ์žˆ๋Š” ๊ฒฝ์šฐ, ์ด์ „์— ํ• ๋‹น๋œ CUDA ๋ฉ”๋ชจ๋ฆฌ๋ฅผ ํ•ด์ œํ•˜๊ณ  ์ƒˆ๋กœ์šด ํฌ๊ธฐ์— ๋งž๊ฒŒ ๋‹ค์‹œ ํ• ๋‹นํ•œ๋‹ค.

  • ์ดํ›„, ์ƒˆ๋กœ์šด rand ๋ฐฐ์—ด์„ CUDA ๋ฉ”๋ชจ๋ฆฌ๋กœ ๋ณต์‚ฌํ•œ๋‹ค.

make_dropout_layer

dropout_layer make_dropout_layer(int batch, int inputs, float probability)
{
    dropout_layer l = {0};
    l.type = DROPOUT;
    l.probability = probability;
    l.inputs = inputs;
    l.outputs = inputs;
    l.batch = batch;
    l.rand = calloc(inputs*batch, sizeof(float));
    l.scale = 1./(1.-probability);
    l.forward = forward_dropout_layer;
    l.backward = backward_dropout_layer;
    #ifdef GPU
    l.forward_gpu = forward_dropout_layer_gpu;
    l.backward_gpu = backward_dropout_layer_gpu;
    l.rand_gpu = cuda_make_array(l.rand, inputs*batch);
    #endif
    fprintf(stderr, "dropout       p = %.2f               %4d  ->  %4d\n", probability, inputs, inputs);
    return l;
}

ํ•จ์ˆ˜ ์ด๋ฆ„: make_dropout_layer

  • ์ž…๋ ฅ: batch(int): ๋ฐฐ์น˜ ํฌ๊ธฐ

  • inputs(int): ์ž…๋ ฅ ํฌ๊ธฐ

  • probability(float): ๋“œ๋กญ์•„์›ƒ ํ™•๋ฅ 

๋™์ž‘:

  • ๋“œ๋กญ์•„์›ƒ ๋ ˆ์ด์–ด๋ฅผ ์ƒ์„ฑํ•˜๊ณ  ์ดˆ๊ธฐํ™”ํ•œ๋‹ค.

์„ค๋ช…:

  • ๋“œ๋กญ์•„์›ƒ ๋ ˆ์ด์–ด ๊ตฌ์กฐ์ฒด๋ฅผ ์„ ์–ธํ•˜๊ณ  ์ดˆ๊ธฐํ™”ํ•œ๋‹ค.

  • ๋“œ๋กญ์•„์›ƒ ๋ ˆ์ด์–ด์˜ ํƒ€์ž…์„ DROPOUT์œผ๋กœ ์„ค์ •ํ•œ๋‹ค.

  • ๋“œ๋กญ์•„์›ƒ ํ™•๋ฅ , ์ž…๋ ฅ ํฌ๊ธฐ, ์ถœ๋ ฅ ํฌ๊ธฐ, ๋ฐฐ์น˜ ํฌ๊ธฐ๋ฅผ ์„ค์ •ํ•œ๋‹ค.

  • ์ž…๋ ฅ ํฌ๊ธฐ์™€ ์ถœ๋ ฅ ํฌ๊ธฐ๊ฐ€ ๊ฐ™์œผ๋ฏ€๋กœ l.outputs = l.inputs๋กœ ์„ค์ •ํ•œ๋‹ค.

  • ๋ฐฐ์น˜ ํฌ๊ธฐ์™€ ์ž…๋ ฅ ํฌ๊ธฐ๋ฅผ ๊ณฑํ•œ ๋งŒํผ์˜ ํฌ๊ธฐ๋ฅผ ๊ฐ–๋Š” ๋‚œ์ˆ˜ ๋ฐฐ์—ด l.rand๋ฅผ ์ƒ์„ฑํ•˜๊ณ  ์ดˆ๊ธฐํ™”ํ•œ๋‹ค.

  • ์Šค์ผ€์ผ๋ง ํŒŒ๋ผ๋ฏธํ„ฐ l.scale์„ ๊ณ„์‚ฐํ•œ๋‹ค.

  • forward_dropout_layer์™€ backward_dropout_layer ํ•จ์ˆ˜๋ฅผ ์„ค์ •ํ•œ๋‹ค.

  • GPU๋ฅผ ์‚ฌ์šฉํ•˜๋Š” ๊ฒฝ์šฐ, forward_dropout_layer_gpu์™€ backward_dropout_layer_gpu ํ•จ์ˆ˜๋„ ์„ค์ •ํ•˜๊ณ , ๋‚œ์ˆ˜ ๋ฐฐ์—ด l.rand_gpu๋ฅผ ์ƒ์„ฑํ•˜๊ณ  ์ดˆ๊ธฐํ™”ํ•œ๋‹ค.

  • ์ƒ์„ฑํ•œ ๋“œ๋กญ์•„์›ƒ ๋ ˆ์ด์–ด์˜ ์ •๋ณด๋ฅผ ์ถœ๋ ฅํ•œ๋‹ค.

  • ์ƒ์„ฑํ•œ ๋“œ๋กญ์•„์›ƒ ๋ ˆ์ด์–ด ๊ตฌ์กฐ์ฒด๋ฅผ ๋ฐ˜ํ™˜ํ•œ๋‹ค.

Last updated

Was this helpful?