gru_layer

GRU layer ๋ž€?

GRU (Gated Recurrent Unit) ๋ ˆ์ด์–ด๋Š” ๋ฐ˜๋ณต ์‹ ๊ฒฝ๋ง (Recurrent Neural Network, RNN)์˜ ํ•œ ์ข…๋ฅ˜๋กœ, ๊ธด ์‹œํ€€์Šค๋ฅผ ์ฒ˜๋ฆฌํ•˜๋Š” ๋ฐ์— ์‚ฌ์šฉ๋ฉ๋‹ˆ๋‹ค.

GRU๋Š” ๊ธฐ๋ณธ์ ์œผ๋กœ LSTM (Long Short-Term Memory)๊ณผ ์œ ์‚ฌํ•œ ์•„์ด๋””์–ด๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ ํ•˜๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค. LSTM๊ณผ ๋งˆ์ฐฌ๊ฐ€์ง€๋กœ, GRU๋„ RNN ๊ณ„์—ด์˜ ๋ ˆ์ด์–ด๋กœ์„œ ์‹œํ€€์Šค ๋ฐ์ดํ„ฐ๋ฅผ ์ฒ˜๋ฆฌํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ํ•˜์ง€๋งŒ LSTM๊ณผ๋Š” ๋‹ฌ๋ฆฌ, GRU๋Š” ๊ฒŒ์ดํŠธ ๋ฉ”์ปค๋‹ˆ์ฆ˜์„ ์‚ฌ์šฉํ•˜์—ฌ ๊ธฐ์–ต์„ ๋ณดํ˜ธํ•˜๊ณ , ์ด์ „ ์ƒํƒœ์—์„œ ์ •๋ณด๋ฅผ ๊ฐ€์ ธ์˜ค๋Š” ๋ฐฉ๋ฒ•์„ ๊ฐ„๋‹จํ™”ํ•˜์—ฌ ๋” ์ ์€ ๊ณ„์‚ฐ์œผ๋กœ ์žฅ๊ธฐ์ ์ธ ์ƒํƒœ๋ฅผ ์œ ์ง€ํ•  ์ˆ˜ ์žˆ๋„๋ก ํ•ฉ๋‹ˆ๋‹ค.

GRU๋Š” LSTM๋ณด๋‹ค ๋” ๊ฐ„๋‹จํ•œ ๊ตฌ์กฐ๋ฅผ ๊ฐ€์ง€๊ณ  ์žˆ์œผ๋ฉฐ, ๋” ์ ์€ ํŒŒ๋ผ๋ฏธํ„ฐ๋ฅผ ํ•„์š”๋กœ ํ•ฉ๋‹ˆ๋‹ค. GRU๋Š” LSTM๋ณด๋‹ค ํ•™์Šต ์†๋„๊ฐ€ ๋” ๋น ๋ฅด๊ณ , ์ž‘์€ ๋ฐ์ดํ„ฐ์…‹์—์„œ ๋” ์ผ๋ฐ˜์ ์ธ ๋ชจ๋ธ์„ ๋งŒ๋“ค์–ด๋‚ด๋Š” ๊ฒฝํ–ฅ์ด ์žˆ์Šต๋‹ˆ๋‹ค.

GRU ๋ ˆ์ด์–ด๋Š” 2๊ฐœ์˜ ๊ฒŒ์ดํŠธ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๊ธฐ์–ต์„ ์กฐ์ ˆํ•ฉ๋‹ˆ๋‹ค. ์ฒซ ๋ฒˆ์งธ ๊ฒŒ์ดํŠธ๋Š” "์—…๋ฐ์ดํŠธ ๊ฒŒ์ดํŠธ"๋ผ๊ณ  ๋ถˆ๋ฆฌ๋ฉฐ, ํ˜„์žฌ ์ž…๋ ฅ๊ณผ ์ด์ „ ์ƒํƒœ๋ฅผ ๊ฒฐํ•ฉํ•˜์—ฌ ์ƒˆ๋กœ์šด ์ƒํƒœ๋ฅผ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค. ๋‘ ๋ฒˆ์งธ ๊ฒŒ์ดํŠธ๋Š” "์žฌ์„ค์ • ๊ฒŒ์ดํŠธ"๋ผ๊ณ  ๋ถˆ๋ฆฌ๋ฉฐ, ์ด์ „ ์ƒํƒœ์˜ ์ผ๋ถ€๋ฅผ ๋ฒ„๋ฆฌ๊ณ  ์ƒˆ๋กœ์šด ์ƒํƒœ๋ฅผ ๋งŒ๋“ญ๋‹ˆ๋‹ค. GRU ๋ ˆ์ด์–ด๋Š” ์ด๋Ÿฌํ•œ ๊ฒŒ์ดํŠธ๋“ค์„ ์‚ฌ์šฉํ•˜์—ฌ ์ž…๋ ฅ ์‹œํ€€์Šค์™€ ์ด์ „ ์ƒํƒœ๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ ํ•œ ๋‹ค์Œ, ์ƒˆ๋กœ์šด ์ƒํƒœ๋ฅผ ์ถœ๋ ฅํ•ฉ๋‹ˆ๋‹ค.

GRU ๋ ˆ์ด์–ด๋Š” ์ฃผ๋กœ ์‹œํ€€์Šค ๋ฐ์ดํ„ฐ๋ฅผ ๋‹ค๋ฃจ๋Š” ์ž์—ฐ์–ด ์ฒ˜๋ฆฌ(NLP) ๋ถ„์•ผ์—์„œ ์‚ฌ์šฉ๋ฉ๋‹ˆ๋‹ค. GRU ๋ ˆ์ด์–ด๋ฅผ ์ ์šฉํ•œ ๋ชจ๋ธ์€ ํ…์ŠคํŠธ ์ƒ์„ฑ, ๋ฒˆ์—ญ, ๊ฐ์„ฑ ๋ถ„์„ ๋“ฑ ๋‹ค์–‘ํ•œ ํƒœ์Šคํฌ์—์„œ ์ข‹์€ ์„ฑ๋Šฅ์„ ๋ณด์ž…๋‹ˆ๋‹ค.


increment_layer

static void increment_layer(layer *l, int steps)
{
    int num = l->outputs*l->batch*steps;
    l->output += num;
    l->delta += num;
    l->x += num;
    l->x_norm += num;

#ifdef GPU
    l->output_gpu += num;
    l->delta_gpu += num;
    l->x_gpu += num;
    l->x_norm_gpu += num;
#endif
}

ํ•จ์ˆ˜ ์ด๋ฆ„: increment_layer

์ž…๋ ฅ:

  • layer *l: ์—…๋ฐ์ดํŠธํ•  ๋ ˆ์ด์–ด

  • int steps: ์ด๋™ํ•  ์Šคํ… ์ˆ˜

๋™์ž‘:

  • layer ๊ตฌ์กฐ์ฒด ํฌ์ธํ„ฐ์ธ l์˜ output, delta, x, x_norm์— steps๋งŒํผ ์ด๋™ํ•œ ํฌ์ธํ„ฐ๋ฅผ ํ• ๋‹นํ•œ๋‹ค.

  • GPU ํ™˜๊ฒฝ์—์„œ๋Š” l์˜ output_gpu, delta_gpu, x_gpu, x_norm_gpu์— steps๋งŒํผ ์ด๋™ํ•œ ํฌ์ธํ„ฐ๋ฅผ ํ• ๋‹นํ•œ๋‹ค.

์„ค๋ช…:

  • ํ•ด๋‹น ํ•จ์ˆ˜๋Š” ๋ ˆ์ด์–ด์˜ ํฌ์ธํ„ฐ๋ฅผ steps๋งŒํผ ์ด๋™์‹œ์ผœ ์—…๋ฐ์ดํŠธํ•˜๋Š” ํ•จ์ˆ˜์ด๋‹ค.

  • ํฌ์ธํ„ฐ๋ฅผ ์ด๋™์‹œ์ผœ์„œ ์ด์ „์˜ ๊ฐ’์„ ์ฐธ์กฐํ•˜์ง€ ์•Š๊ณ  ์ƒˆ๋กœ์šด ๊ฐ’์„ ์ฐธ์กฐํ•  ์ˆ˜ ์žˆ๋„๋ก ํ•œ๋‹ค.

  • GPU ํ™˜๊ฒฝ์—์„œ๋Š” GPU ๋ฉ”๋ชจ๋ฆฌ ์ƒ์˜ ํฌ์ธํ„ฐ๋ฅผ ์ด๋™์‹œํ‚จ๋‹ค.

forward_gru_layer

void forward_gru_layer(layer l, network net)
{
    network s = net;
    s.train = net.train;
    int i;
    layer uz = *(l.uz);
    layer ur = *(l.ur);
    layer uh = *(l.uh);

    layer wz = *(l.wz);
    layer wr = *(l.wr);
    layer wh = *(l.wh);

    fill_cpu(l.outputs * l.batch * l.steps, 0, uz.delta, 1);
    fill_cpu(l.outputs * l.batch * l.steps, 0, ur.delta, 1);
    fill_cpu(l.outputs * l.batch * l.steps, 0, uh.delta, 1);

    fill_cpu(l.outputs * l.batch * l.steps, 0, wz.delta, 1);
    fill_cpu(l.outputs * l.batch * l.steps, 0, wr.delta, 1);
    fill_cpu(l.outputs * l.batch * l.steps, 0, wh.delta, 1);
    if(net.train) {
        fill_cpu(l.outputs * l.batch * l.steps, 0, l.delta, 1);
        copy_cpu(l.outputs*l.batch, l.state, 1, l.prev_state, 1);
    }

    for (i = 0; i < l.steps; ++i) {
        s.input = l.state;
        forward_connected_layer(wz, s);
        forward_connected_layer(wr, s);

        s.input = net.input;
        forward_connected_layer(uz, s);
        forward_connected_layer(ur, s);
        forward_connected_layer(uh, s);


        copy_cpu(l.outputs*l.batch, uz.output, 1, l.z_cpu, 1);
        axpy_cpu(l.outputs*l.batch, 1, wz.output, 1, l.z_cpu, 1);

        copy_cpu(l.outputs*l.batch, ur.output, 1, l.r_cpu, 1);
        axpy_cpu(l.outputs*l.batch, 1, wr.output, 1, l.r_cpu, 1);

        activate_array(l.z_cpu, l.outputs*l.batch, LOGISTIC);
        activate_array(l.r_cpu, l.outputs*l.batch, LOGISTIC);

        copy_cpu(l.outputs*l.batch, l.state, 1, l.forgot_state, 1);
        mul_cpu(l.outputs*l.batch, l.r_cpu, 1, l.forgot_state, 1);

        s.input = l.forgot_state;
        forward_connected_layer(wh, s);

        copy_cpu(l.outputs*l.batch, uh.output, 1, l.h_cpu, 1);
        axpy_cpu(l.outputs*l.batch, 1, wh.output, 1, l.h_cpu, 1);

        if(l.tanh){
            activate_array(l.h_cpu, l.outputs*l.batch, TANH);
        } else {
            activate_array(l.h_cpu, l.outputs*l.batch, LOGISTIC);
        }

        weighted_sum_cpu(l.state, l.h_cpu, l.z_cpu, l.outputs*l.batch, l.output);

        copy_cpu(l.outputs*l.batch, l.output, 1, l.state, 1);

        net.input += l.inputs*l.batch;
        l.output += l.outputs*l.batch;
        increment_layer(&uz, 1);
        increment_layer(&ur, 1);
        increment_layer(&uh, 1);

        increment_layer(&wz, 1);
        increment_layer(&wr, 1);
        increment_layer(&wh, 1);
    }
}

ํ•จ์ˆ˜ ์ด๋ฆ„: forward_gru_layer

์ž…๋ ฅ:

  • layer l: GRU ๋ ˆ์ด์–ด์˜ ์ •๋ณด์™€ ๋งค๊ฐœ๋ณ€์ˆ˜๋ฅผ ๋‹ด๊ณ  ์žˆ๋Š” layer ๊ตฌ์กฐ์ฒด

  • network net: ๋„คํŠธ์›Œํฌ์˜ ์ •๋ณด์™€ ๋งค๊ฐœ๋ณ€์ˆ˜๋ฅผ ๋‹ด๊ณ  ์žˆ๋Š” network ๊ตฌ์กฐ์ฒด

๋™์ž‘:

  • ์ž…๋ ฅ ๋ฐ์ดํ„ฐ์˜ GRU ๋ ˆ์ด์–ด๋ฅผ ํ†ตํ•ด ์ˆœ๋ฐฉํ–ฅ ์ „ํŒŒ(forward propagation)๋ฅผ ์ˆ˜ํ–‰ํ•˜๋Š” ํ•จ์ˆ˜๋กœ, ์ž…๋ ฅ ๋ฐ์ดํ„ฐ๋ฅผ GRU ๋ ˆ์ด์–ด๋ฅผ ํ†ตํ•ด ์ฒ˜๋ฆฌํ•˜์—ฌ ์ถœ๋ ฅ ๊ฐ’์„ ๊ณ„์‚ฐํ•˜๊ณ , ๊ทธ ๊ฐ’์„ ๋‹ค์Œ ๋ ˆ์ด์–ด์˜ ์ž…๋ ฅ์œผ๋กœ ๋„˜๊ฒจ์คŒ.

  • ์ด๋•Œ, backward propagation์„ ์œ„ํ•ด ํ•„์š”ํ•œ ์ค‘๊ฐ„๊ฐ’๋“ค์„ ์ €์žฅํ•ด ๋†“์Œ.

์„ค๋ช…:

  • GRU ๋ ˆ์ด์–ด์˜ ๋งค๊ฐœ๋ณ€์ˆ˜๋“ค ์ค‘์—์„œ uz, ur, uh๋Š” ์ด์ „ ์ƒํƒœ(previous state)๋กœ๋ถ€ํ„ฐ์˜ ์ž…๋ ฅ(input)์„ ์ฒ˜๋ฆฌํ•˜๋Š” ๊ฐ€์ค‘์น˜(weight) ๋งค๊ฐœ๋ณ€์ˆ˜์ด๊ณ , wz, wr, wh๋Š” ํ˜„์žฌ ์ž…๋ ฅ(input)์„ ์ฒ˜๋ฆฌํ•˜๋Š” ๊ฐ€์ค‘์น˜ ๋งค๊ฐœ๋ณ€์ˆ˜์ž„.

  • GRU ๋ ˆ์ด์–ด๋Š” ์‹œ๊ณ„์—ด(sequence) ๋ฐ์ดํ„ฐ๋ฅผ ์ฒ˜๋ฆฌํ•˜๊ธฐ ์œ„ํ•œ RNN์˜ ํ•œ ์ข…๋ฅ˜๋กœ, ์ด์ „ ์‹œ์ ์˜ ์ƒํƒœ(previous state)๋ฅผ ์žฌ์‚ฌ์šฉํ•˜๋Š” ๋ ˆ์ด์–ด์ž„.

  • forward_connected_layer ํ•จ์ˆ˜๋ฅผ ํ†ตํ•ด ๊ฐ€์ค‘์น˜์™€ ์ž…๋ ฅ์„ ๊ณฑํ•œ ๊ฐ’๊ณผ bias๋ฅผ ๋”ํ•œ ๊ฐ’์„ ๊ณ„์‚ฐํ•˜์—ฌ ํ™œ์„ฑํ™” ํ•จ์ˆ˜(Logistic ๋˜๋Š” Tanh)๋ฅผ ์ ์šฉํ•จ.

  • uz, ur, uh ๋ ˆ์ด์–ด์—์„œ ๋‚˜์˜จ ์ถœ๋ ฅ๊ฐ’๊ณผ wz, wr, wh ๋ ˆ์ด์–ด์—์„œ ๋‚˜์˜จ ์ถœ๋ ฅ๊ฐ’์„ ์ด์šฉํ•˜์—ฌ z์™€ r ๊ฐ’์„ ๊ณ„์‚ฐํ•จ.

  • z๊ฐ’์€ ์ด์ „ ์ƒํƒœ์™€ ํ˜„์žฌ ์ž…๋ ฅ์„ ์กฐํ•ฉํ•œ ํ›„ ๋กœ์ง€์Šคํ‹ฑ ํ•จ์ˆ˜๋ฅผ ์ ์šฉํ•˜์—ฌ ๊ณ„์‚ฐํ•จ.

  • r๊ฐ’์€ z๊ฐ’๊ณผ ๋งˆ์ฐฌ๊ฐ€์ง€๋กœ ์ด์ „ ์ƒํƒœ์™€ ํ˜„์žฌ ์ž…๋ ฅ์„ ์กฐํ•ฉํ•œ ํ›„ ๋กœ์ง€์Šคํ‹ฑ ํ•จ์ˆ˜๋ฅผ ์ ์šฉํ•˜์—ฌ ๊ณ„์‚ฐํ•จ.

  • h๊ฐ’์€ z๊ฐ’๊ณผ ์ด์ „ ์ƒํƒœ๋ฅผ ์ด์šฉํ•˜์—ฌ ์ƒˆ๋กœ์šด ์ƒํƒœ๋ฅผ ๊ณ„์‚ฐํ•˜๊ธฐ ์œ„ํ•œ ๊ฒŒ์ดํŠธ(gate)๋ฅผ ๊ณ„์‚ฐํ•จ.

  • ๊ณ„์‚ฐ๋œ h๊ฐ’์— Tanh ๋˜๋Š” Logistic ํ•จ์ˆ˜๋ฅผ ์ ์šฉํ•˜์—ฌ ์ถœ๋ ฅ๊ฐ’(output)์„ ๊ณ„์‚ฐํ•จ.

  • GRU ๋ ˆ์ด์–ด๋Š” ์—ฌ๋Ÿฌ ์‹œ์ (time step)์œผ๋กœ ๊ตฌ์„ฑ๋˜์–ด ์žˆ์œผ๋ฏ€๋กœ, steps ๋งŒํผ ๋ฐ˜๋ณต์ ์œผ๋กœ forward_connected_layer ํ•จ์ˆ˜๋ฅผ ํ˜ธ์ถœํ•˜์—ฌ ์ค‘๊ฐ„๊ฐ’๋“ค์„ ๊ณ„์‚ฐํ•จ.

backward_gru_layer

void backward_gru_layer(layer l, network net)
{
}

ํ•จ์ˆ˜ ์ด๋ฆ„: backward_gru_layer

์ž…๋ ฅ:

  • layer l

  • network net (๋‘˜ ๋‹ค ๊ตฌ์กฐ์ฒด)

๋™์ž‘:

  • GRU (๊ฒŒ์ดํŠธ ์ˆœํ™˜ ์œ ๋‹›) ๋ ˆ์ด์–ด์˜ ์—ญ์ „ํŒŒ(backpropagation)๋ฅผ ๊ณ„์‚ฐํ•˜๊ณ  ์ด์ „ ๋ ˆ์ด์–ด์—๊ฒŒ ์˜ค์ฐจ ์‹ ํ˜ธ(error signal)๋ฅผ ์ „๋‹ฌํ•ฉ๋‹ˆ๋‹ค.

  • ์ด๋ฅผ ์œ„ํ•ด ์ž…๋ ฅ ์‹ ํ˜ธ์™€ ๊ฐ€์ค‘์น˜(weight)์— ๋Œ€ํ•œ ๋ฏธ๋ถ„(gradient)์„ ๊ณ„์‚ฐํ•ฉ๋‹ˆ๋‹ค.

์„ค๋ช…:

  • l: GRU ๋ ˆ์ด์–ด์˜ ๊ตฌ์กฐ์ฒด๋กœ, ์ž…๋ ฅ ์‹ ํ˜ธ์™€ ๊ฐ€์ค‘์น˜, ์ถœ๋ ฅ๊ณผ ๊ฐ™์€ ๋‹ค์–‘ํ•œ ์ •๋ณด๋ฅผ ๋‹ด๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค.

  • net: ์‹ ๊ฒฝ๋ง ๊ตฌ์กฐ์ฒด๋กœ, ์—ญ์ „ํŒŒ ์‹œ์— ์ด์ „ ๋ ˆ์ด์–ด๋กœ ์˜ค์ฐจ ์‹ ํ˜ธ๋ฅผ ์ „๋‹ฌํ•˜๊ธฐ ์œ„ํ•ด ์‚ฌ์šฉ๋ฉ๋‹ˆ๋‹ค.

์ด ํ•จ์ˆ˜๋Š” ๋นˆ ์ƒํƒœ๋กœ ๋‚จ๊ฒจ๋‘” ๊ฒƒ์ด ์•„๋‹ˆ๋ผ, ๊ตฌํ˜„ ๋‚ด์šฉ์ด ์—†๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค. ํ•จ์ˆ˜๋ฅผ ํ˜ธ์ถœํ•  ๋•Œ ์‹ค์ œ๋กœ ๊ณ„์‚ฐ์ด ์ด๋ฃจ์–ด์ง‘๋‹ˆ๋‹ค.

update_gru_layer

void update_gru_layer(layer l, update_args a)
{
    update_connected_layer(*(l.ur), a);
    update_connected_layer(*(l.uz), a);
    update_connected_layer(*(l.uh), a);
    update_connected_layer(*(l.wr), a);
    update_connected_layer(*(l.wz), a);
    update_connected_layer(*(l.wh), a);
}

ํ•จ์ˆ˜ ์ด๋ฆ„: update_gru_layer

์ž…๋ ฅ:

  • layer l: GRU ๋ ˆ์ด์–ด ๊ตฌ์กฐ์ฒด

  • update_args a: ์—…๋ฐ์ดํŠธ ์ธ์ž ๊ตฌ์กฐ์ฒด

๋™์ž‘:

  • GRU ๋ ˆ์ด์–ด์˜ ๊ฐ๊ฐ์˜ ์—ฐ๊ฒฐ๋œ ๋ ˆ์ด์–ด(ur, uz, uh, wr, wz, wh)๋“ค์˜ ๊ฐ€์ค‘์น˜(weight)์™€ bias๋ฅผ ์—…๋ฐ์ดํŠธํ•˜๋Š” ํ•จ์ˆ˜

์„ค๋ช…:

  • ์ž…๋ ฅ์œผ๋กœ ์ฃผ์–ด์ง„ GRU ๋ ˆ์ด์–ด ๊ตฌ์กฐ์ฒด l์˜ ์—ฐ๊ฒฐ๋œ ๋ ˆ์ด์–ด(ur, uz, uh, wr, wz, wh)๋“ค์˜ ๊ฐ€์ค‘์น˜์™€ bias๋ฅผ ์—…๋ฐ์ดํŠธํ•˜๋Š” ํ•จ์ˆ˜์ด๋‹ค.

  • ์ด๋ฅผ ์œ„ํ•ด update_connected_layer() ํ•จ์ˆ˜๋ฅผ ๊ฐ ๋ ˆ์ด์–ด์— ๋Œ€ํ•ด ํ˜ธ์ถœํ•˜์—ฌ ๊ฐ€์ค‘์น˜๋ฅผ ์—…๋ฐ์ดํŠธํ•œ๋‹ค.

make_gru_layer

layer make_gru_layer(int batch, int inputs, int outputs, int steps, int batch_normalize, int adam)
{
    fprintf(stderr, "GRU Layer: %d inputs, %d outputs\n", inputs, outputs);
    batch = batch / steps;
    layer l = {0};
    l.batch = batch;
    l.type = GRU;
    l.steps = steps;
    l.inputs = inputs;

    l.uz = malloc(sizeof(layer));
    fprintf(stderr, "\t\t");
    *(l.uz) = make_connected_layer(batch*steps, inputs, outputs, LINEAR, batch_normalize, adam);
    l.uz->batch = batch;

    l.wz = malloc(sizeof(layer));
    fprintf(stderr, "\t\t");
    *(l.wz) = make_connected_layer(batch*steps, outputs, outputs, LINEAR, batch_normalize, adam);
    l.wz->batch = batch;

    l.ur = malloc(sizeof(layer));
    fprintf(stderr, "\t\t");
    *(l.ur) = make_connected_layer(batch*steps, inputs, outputs, LINEAR, batch_normalize, adam);
    l.ur->batch = batch;

    l.wr = malloc(sizeof(layer));
    fprintf(stderr, "\t\t");
    *(l.wr) = make_connected_layer(batch*steps, outputs, outputs, LINEAR, batch_normalize, adam);
    l.wr->batch = batch;

    l.uh = malloc(sizeof(layer));
    fprintf(stderr, "\t\t");
    *(l.uh) = make_connected_layer(batch*steps, inputs, outputs, LINEAR, batch_normalize, adam);
    l.uh->batch = batch;

    l.wh = malloc(sizeof(layer));
    fprintf(stderr, "\t\t");
    *(l.wh) = make_connected_layer(batch*steps, outputs, outputs, LINEAR, batch_normalize, adam);
    l.wh->batch = batch;

    l.batch_normalize = batch_normalize;


    l.outputs = outputs;
    l.output = calloc(outputs*batch*steps, sizeof(float));
    l.delta = calloc(outputs*batch*steps, sizeof(float));
    l.state = calloc(outputs*batch, sizeof(float));
    l.prev_state = calloc(outputs*batch, sizeof(float));
    l.forgot_state = calloc(outputs*batch, sizeof(float));
    l.forgot_delta = calloc(outputs*batch, sizeof(float));

    l.r_cpu = calloc(outputs*batch, sizeof(float));
    l.z_cpu = calloc(outputs*batch, sizeof(float));
    l.h_cpu = calloc(outputs*batch, sizeof(float));

    l.forward = forward_gru_layer;
    l.backward = backward_gru_layer;
    l.update = update_gru_layer;

    return l;
}

ํ•จ์ˆ˜ ์ด๋ฆ„: make_gru_layer

์ž…๋ ฅ:

  • int batch: ๋ฐฐ์น˜ ํฌ๊ธฐ

  • int inputs: ์ž…๋ ฅ์˜ ํฌ๊ธฐ

  • int outputs: ์ถœ๋ ฅ์˜ ํฌ๊ธฐ

  • int steps: ์‹œ๊ฐ„ ์Šคํ…์˜ ์ˆ˜

  • int batch_normalize: ๋ฐฐ์น˜ ์ •๊ทœํ™” ์‚ฌ์šฉ ์—ฌ๋ถ€

  • int adam: Adam ์˜ตํ‹ฐ๋งˆ์ด์ € ์‚ฌ์šฉ ์—ฌ๋ถ€

๋™์ž‘:

  • GRU ๋ ˆ์ด์–ด๋ฅผ ์ƒ์„ฑํ•˜๊ณ  ์ดˆ๊ธฐํ™”ํ•˜๋Š” ํ•จ์ˆ˜์ด๋‹ค. GRU ๋ ˆ์ด์–ด๋Š” uz, wr, uh, wh ๋“ฑ์˜ ์—ฐ๊ฒฐ ๋ ˆ์ด์–ด๋กœ ๊ตฌ์„ฑ๋˜์–ด ์žˆ๋‹ค.

์„ค๋ช…:

  • ์ž…๋ ฅ๊ฐ’์œผ๋กœ ๋ฐ›์€ batch ๊ฐ’์€ steps๋กœ ๋‚˜๋ˆ„์–ด์ ธ์„œ ์‚ฌ์šฉ๋œ๋‹ค.

  • ๋ ˆ์ด์–ด์˜ ํƒ€์ž…์€ GRU๋กœ ์„ค์ •๋œ๋‹ค.

  • uz, wz, ur, wr, uh, wh ๋“ฑ์˜ ์—ฐ๊ฒฐ ๋ ˆ์ด์–ด๊ฐ€ ์ƒ์„ฑ๋˜๊ณ  ์ดˆ๊ธฐํ™”๋œ๋‹ค.

  • ์ถœ๋ ฅ๊ฐ’, delta, state, prev_state, forgot_state, forgot_delta, r_cpu, z_cpu, h_cpu ๋“ฑ์˜ ๊ฐ’๋“ค์ด ์ดˆ๊ธฐํ™”๋œ๋‹ค.

  • forward, backward, update ํ•จ์ˆ˜๊ฐ€ ์„ค์ •๋œ๋‹ค.

  • ์ดˆ๊ธฐํ™”๋œ GRU ๋ ˆ์ด์–ด๊ฐ€ ๋ฐ˜ํ™˜๋œ๋‹ค.

Last updated

Was this helpful?