connected_layer

Fully Connected Layer ๋ž€?

์ด์ „ Layer์˜ ๋ชจ๋“  ๋…ธ๋“œ๊ฐ€ ๋‹ค์Œ Layer์˜ ๋ชจ๋“  ๋…ธ๋“œ์— ๊ฐ๊ฐ ํ•˜๋‚˜์”ฉ ์—ฐ๊ฒฐ๋˜์–ด์žˆ๋Š” Layer ์ž…๋‹ˆ๋‹ค.

  • ๊ฐ€์žฅ ๊ธฐ๋ณธ์ ์ธ Layer

  • 1์ฐจ์› ๋ฐฐ์—ด๋กœ๋งŒ ์ด๋ฃจ์–ด์ ธ ์žˆ์Šต๋‹ˆ๋‹ค.

์ดํ•ด๋ฅผ ๋•๊ธฐ์œ„ํ•ด ๊ทธ๋ฆผ์œผ๋กœ ์‚ดํŽด๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค.

ํฌ๊ฒŒ ๋ณต์žกํ•˜์ง€ ์•Š๊ณ  ๋‹จ์ˆœํ•œ ์—ฐ์‚ฐ์œผ๋กœ๋งŒ ์ด๋ฃจ์–ด์ ธ ์žˆ์Šต๋‹ˆ๋‹ค.

Fully Connected Layer ์—ญ์ „ํŒŒ๋Š” ์‰ฝ๊ฒŒ ํ‘œํ˜„ํ•˜๋Š” ๊ฒฝ์šฐ ์•„๋ž˜ ๊ทธ๋ฆผ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค.

output์„ ๊ณ„์‚ฐํ•˜๊ธฐ ์œ„ํ•ด์„œ ๊ฐ์ž์˜ id๋ฅผ ๊ฐ€์ง€๊ณ  ์žˆ๋Š” weight๊ฐ€ ์‚ฌ์šฉ๋œ ๊ณณ์„ ๋ณด์‹œ๋ฉด ์ดํ•ดํ•˜๊ธฐ ์‰ฝ์Šต๋‹ˆ๋‹ค. ์˜ˆ๋ฅผ ๋“ค์–ด์„œ w11w_{11}์€ h11h_{11}๋ฅผ ์—ฐ์‚ฐํ•˜๋Š”๋ฐ๋งŒ ์‚ฌ์šฉ๋˜์—ˆ๊ธฐ ๋•Œ๋ฌธ์— ํ•ด๋‹น ๊ฐ’๋งŒ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค.


forward_connected_layer

void forward_connected_layer(layer l, network net)
{
    fill_cpu(l.outputs*l.batch, 0, l.output, 1);
    int m = l.batch;
    int k = l.inputs;
    int n = l.outputs;
    float *a = net.input;
    float *b = l.weights;
    float *c = l.output;
    gemm(0,1,m,n,k,1,a,k,b,k,1,c,n);
    if(l.batch_normalize){
        forward_batchnorm_layer(l, net);
    } else {
        add_bias(l.output, l.biases, l.batch, l.outputs, 1);
    }
    activate_array(l.output, l.outputs*l.batch, l.activation);
}

ํ•จ์ˆ˜ ์ด๋ฆ„: forward_connected_layer

์ž…๋ ฅ:

  • layer l: ์—ฐ๊ฒฐ ์ธต(layer) ๊ตฌ์กฐ์ฒด

  • network net: ๋„คํŠธ์›Œํฌ(network) ๊ตฌ์กฐ์ฒด

๋™์ž‘:

  • l.output ๋ฐฐ์—ด์„ 0์œผ๋กœ ์ฑ„์›€

  • ํ–‰๋ ฌ ๊ณฑ ์—ฐ์‚ฐ(GEMM)์„ ์ˆ˜ํ–‰ํ•˜์—ฌ l.output ๋ฐฐ์—ด์„ ์ƒˆ๋กœ์šด ๊ฐ’์œผ๋กœ ์—…๋ฐ์ดํŠธ ํ•จ

  • ๋ฐฐ์น˜ ์ •๊ทœํ™”(batch normalization)๊ฐ€ ํ™œ์„ฑํ™”๋˜์–ด ์žˆ์œผ๋ฉด, forward_batchnorm_layer ํ•จ์ˆ˜๋ฅผ ํ˜ธ์ถœํ•˜์—ฌ l.output ๋ฐฐ์—ด์„ ์—…๋ฐ์ดํŠธ ํ•จ

  • ๋ฐฐ์น˜ ์ •๊ทœํ™”๊ฐ€ ๋น„ํ™œ์„ฑํ™”๋˜์–ด ์žˆ์œผ๋ฉด, l.output ๋ฐฐ์—ด์— l.biases ๊ฐ’์„ ๋”ํ•จ

  • l.activation ํ•จ์ˆ˜๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ l.output ๋ฐฐ์—ด์˜ ๋ชจ๋“  ์›์†Œ์— ํ™œ์„ฑํ™” ํ•จ์ˆ˜๋ฅผ ์ ์šฉํ•จ

์„ค๋ช…:

  • forward_connected_layer ํ•จ์ˆ˜๋Š” ์™„์ „ ์—ฐ๊ฒฐ(fully connected) ์ธต์˜ ์ˆœ์ „ํŒŒ(forward propagation) ์—ฐ์‚ฐ์„ ์ˆ˜ํ–‰ํ•˜๋Š” ํ•จ์ˆ˜์ž…๋‹ˆ๋‹ค.

  • fill_cpu ํ•จ์ˆ˜๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ l.output ๋ฐฐ์—ด์„ 0์œผ๋กœ ์ดˆ๊ธฐํ™”ํ•ฉ๋‹ˆ๋‹ค.

  • GEMM ํ•จ์ˆ˜๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์ž…๋ ฅ(input) ๋ฐ์ดํ„ฐ์™€ ๊ฐ€์ค‘์น˜(weights)๋ฅผ ๊ณฑํ•˜์—ฌ l.output ๋ฐฐ์—ด์„ ์ƒˆ๋กœ์šด ๊ฐ’์œผ๋กœ ์—…๋ฐ์ดํŠธํ•ฉ๋‹ˆ๋‹ค.

  • ๋ฐฐ์น˜ ์ •๊ทœํ™”๊ฐ€ ํ™œ์„ฑํ™”๋˜์–ด ์žˆ์œผ๋ฉด, forward_batchnorm_layer ํ•จ์ˆ˜๋ฅผ ํ˜ธ์ถœํ•˜์—ฌ l.output ๋ฐฐ์—ด์„ ์—…๋ฐ์ดํŠธํ•ฉ๋‹ˆ๋‹ค. ๋ฐฐ์น˜ ์ •๊ทœํ™”๊ฐ€ ๋น„ํ™œ์„ฑํ™”๋˜์–ด ์žˆ์œผ๋ฉด, l.output ๋ฐฐ์—ด์— l.biases ๊ฐ’์„ ๋”ํ•ฉ๋‹ˆ๋‹ค.

  • ๋งˆ์ง€๋ง‰์œผ๋กœ, activate_array ํ•จ์ˆ˜๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ l.activation ํ•จ์ˆ˜๋ฅผ ์ ์šฉํ•˜์—ฌ l.output ๋ฐฐ์—ด์˜ ๋ชจ๋“  ์›์†Œ์— ํ™œ์„ฑํ™” ํ•จ์ˆ˜๋ฅผ ์ ์šฉํ•ฉ๋‹ˆ๋‹ค.

backward_connected_layer

ํ•จ์ˆ˜ ์ด๋ฆ„: backward_connected_layer

์ž…๋ ฅ:

  • layer l: backpropagation์ด ์ˆ˜ํ–‰๋  fully connected layer

  • network net: ์—ฐ๊ฒฐ๋œ neural network

๋™์ž‘:

  • l์—์„œ ์ถœ๋ ฅ(l.output)๊ณผ ํ™œ์„ฑํ™” ํ•จ์ˆ˜(l.activation)๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ delta(l.delta)๋ฅผ ๊ณ„์‚ฐ

  • l์ด batch normalization์„ ์‚ฌ์šฉํ•˜๋Š” ๊ฒฝ์šฐ, backward_batchnorm_layer ํ•จ์ˆ˜๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ backpropagation์„ ์ˆ˜ํ–‰ํ•˜๊ณ  ๊ทธ๋ ‡์ง€ ์•Š์œผ๋ฉด backward_bias ํ•จ์ˆ˜๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ํŽธํ–ฅ(l.bias_updates)์˜ delta๋ฅผ ๊ณ„์‚ฐ

  • l.delta์™€ ์ž…๋ ฅ(net.input)์„ ์‚ฌ์šฉํ•˜์—ฌ ๊ฐ€์ค‘์น˜(l.weights)์˜ ์—…๋ฐ์ดํŠธ(l.weight_updates)๋ฅผ ๊ณ„์‚ฐํ•˜๊ธฐ ์œ„ํ•ด GEMM ํ•จ์ˆ˜๋ฅผ ํ˜ธ์ถœ

  • l.delta์™€ l.weights๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์ž…๋ ฅ(net.delta)์˜ delta๋ฅผ ๊ณ„์‚ฐํ•˜๊ธฐ ์œ„ํ•ด GEMM ํ•จ์ˆ˜๋ฅผ ํ˜ธ์ถœ

update_connected_layer

ํ•จ์ˆ˜ ์ด๋ฆ„: update_connected_layer

์ž…๋ ฅ:

  • layer l: ์—ฐ๊ฒฐ ๊ณ„์ธต(layer)์„ ๋‚˜ํƒ€๋‚ด๋Š” ๊ตฌ์กฐ์ฒด

  • update_args a: ๋ชจ๋ธ ์—…๋ฐ์ดํŠธ๋ฅผ ์œ„ํ•œ ๋งค๊ฐœ๋ณ€์ˆ˜๋ฅผ ๋‹ด์€ ๊ตฌ์กฐ์ฒด

๋™์ž‘:

  • ์—ฐ๊ฒฐ ๊ณ„์ธต์˜ ๊ฐ€์ค‘์น˜(weights)์™€ ํŽธํ–ฅ(biases)์„ ์—…๋ฐ์ดํŠธํ•˜๋Š” ํ•จ์ˆ˜

  • ๋งค๊ฐœ๋ณ€์ˆ˜๋กœ ์ฃผ์–ด์ง„ a์— ๋”ฐ๋ผ learning_rate, momentum, decay, batch ํฌ๊ธฐ๋ฅผ ์„ค์ •ํ•˜๊ณ , ์ด๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๊ฐ€์ค‘์น˜์™€ ํŽธํ–ฅ์„ ์—…๋ฐ์ดํŠธํ•œ๋‹ค.

์„ค๋ช…:

  • l.outputs: ํ˜„์žฌ ๊ณ„์ธต์˜ ์ถœ๋ ฅ ๊ฐœ์ˆ˜

  • l.bias_updates: ํŽธํ–ฅ์˜ ์—…๋ฐ์ดํŠธ์— ์‚ฌ์šฉ๋  ๊ฐ’๋“ค์ด ์ €์žฅ๋œ ๋ฐฐ์—ด

  • l.biases: ํ˜„์žฌ ๊ณ„์ธต์˜ ํŽธํ–ฅ๊ฐ’์ด ์ €์žฅ๋œ ๋ฐฐ์—ด

  • l.scale_updates: ๋ฐฐ์น˜ ์ •๊ทœํ™”(batch normalization)๊ฐ€ ์‚ฌ์šฉ๋˜๋Š” ๊ฒฝ์šฐ, ์Šค์ผ€์ผ์˜ ์—…๋ฐ์ดํŠธ์— ์‚ฌ์šฉ๋  ๊ฐ’๋“ค์ด ์ €์žฅ๋œ ๋ฐฐ์—ด

  • l.scales: ๋ฐฐ์น˜ ์ •๊ทœํ™”๊ฐ€ ์‚ฌ์šฉ๋˜๋Š” ๊ฒฝ์šฐ, ์Šค์ผ€์ผ๊ฐ’์ด ์ €์žฅ๋œ ๋ฐฐ์—ด

  • l.inputs: ์ด์ „ ๊ณ„์ธต์˜ ์ถœ๋ ฅ ๊ฐœ์ˆ˜ ํ˜น์€ ์ž…๋ ฅ ๋ฐ์ดํ„ฐ์˜ ์ฐจ์› ์ˆ˜

  • l.weights: ๊ฐ€์ค‘์น˜๊ฐ’์ด ์ €์žฅ๋œ ๋ฐฐ์—ด

  • l.weight_updates: ๊ฐ€์ค‘์น˜์˜ ์—…๋ฐ์ดํŠธ์— ์‚ฌ์šฉ๋  ๊ฐ’๋“ค์ด ์ €์žฅ๋œ ๋ฐฐ์—ด

  • learning_rate: ํ•™์Šต๋ฅ (learning rate) ๊ฐ’

  • momentum: ๋ชจ๋ฉ˜ํ…€(momentum) ๊ฐ’

  • decay: ๊ฐ€์ค‘์น˜ ๊ฐ์†Œ(weight decay) ๊ฐ’

  • batch: ํ˜„์žฌ ๋ฐฐ์น˜(batch)์˜ ํฌ๊ธฐ

  • axpy_cpu(): BLAS ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ ํ•จ์ˆ˜ ์ค‘ ํ•˜๋‚˜๋กœ, ๋ฒกํ„ฐ ๊ฐ„์˜ ์—ฐ์‚ฐ์„ ์ˆ˜ํ–‰ํ•˜๋Š” ํ•จ์ˆ˜

  • scal_cpu(): ๋ฒกํ„ฐ์— ์Šค์นผ๋ผ ๊ฐ’์„ ๊ณฑํ•˜๋Š” ํ•จ์ˆ˜

  • ๋จผ์ €, ํŽธํ–ฅ ์—…๋ฐ์ดํŠธ๋ฅผ ์ˆ˜ํ–‰ํ•œ๋‹ค. ์ด ๋•Œ, axpy_cpu() ํ•จ์ˆ˜๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ํŽธํ–ฅ์˜ ์—…๋ฐ์ดํŠธ๊ฐ’์„ ํŽธํ–ฅ๊ฐ’์— ๋”ํ•˜๊ณ , scal_cpu() ํ•จ์ˆ˜๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๋ชจ๋ฉ˜ํ…€ ๊ฐ’์œผ๋กœ ๊ณฑํ•ด์ค€๋‹ค.

  • ๋งŒ์•ฝ ๋ฐฐ์น˜ ์ •๊ทœํ™”๊ฐ€ ์‚ฌ์šฉ๋˜๋Š” ๊ฒฝ์šฐ, ์Šค์ผ€์ผ ์—…๋ฐ์ดํŠธ๋„ ์ˆ˜ํ–‰ํ•œ๋‹ค. ์ด ๋•Œ, axpy_cpu() ํ•จ์ˆ˜๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์Šค์ผ€์ผ์˜ ์—…๋ฐ์ดํŠธ๊ฐ’์„ ์Šค์ผ€์ผ๊ฐ’์— ๋”ํ•˜๊ณ , scal_cpu() ํ•จ์ˆ˜๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๋ชจ๋ฉ˜ํ…€ ๊ฐ’์œผ๋กœ ๊ณฑํ•ด์ค€๋‹ค.

  • ๊ฐ€์ค‘์น˜ ์—…๋ฐ์ดํŠธ๋ฅผ ์ˆ˜ํ–‰ํ•œ๋‹ค. ์ด ๋•Œ, axpy_cpu() ํ•จ์ˆ˜๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๊ฐ€์ค‘์น˜์˜ ์—…๋ฐ์ดํŠธ๊ฐ’์— ๋Œ€ํ•œ ๊ฐ’์„ ๋จผ์ € weight_updates์— ๋”ํ•œ ๋‹ค์Œ, ๊ฐ€์ค‘์น˜์— ์ด ๊ฐ’์„ ๋”ํ•œ๋‹ค. ์ดํ›„, scal_cpu() ํ•จ์ˆ˜๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๋ชจ๋ฉ˜ํ…€ ๊ฐ’์œผ๋กœ ๊ณฑํ•ด์ค€๋‹ค.

make_connected_layer

ํ•จ์ˆ˜ ์ด๋ฆ„: make_connected_layer

์ž…๋ ฅ:

  • batch: intํ˜•, ๋ฐฐ์น˜ ํฌ๊ธฐ(batch size)

  • inputs: intํ˜•, ์ž…๋ ฅ ํฌ๊ธฐ(input size)

  • outputs: intํ˜•, ์ถœ๋ ฅ ํฌ๊ธฐ(output size)

  • activation: ACTIVATION ์—ด๊ฑฐํ˜•, ํ™œ์„ฑํ™” ํ•จ์ˆ˜(activation function)

  • batch_normalize: intํ˜•, ๋ฐฐ์น˜ ์ •๊ทœํ™” ์—ฌ๋ถ€(batch normalization flag)

  • adam: intํ˜•, Adam ์ตœ์ ํ™” ์•Œ๊ณ ๋ฆฌ์ฆ˜ ์‚ฌ์šฉ ์—ฌ๋ถ€(Adam optimization flag)

๋™์ž‘:

  • ์ž…๋ ฅ๊ฐ’๊ณผ ์ถœ๋ ฅ๊ฐ’ ์‚ฌ์ด์˜ fully connected layer๋ฅผ ์ƒ์„ฑํ•œ๋‹ค.

  • ๋ฐฐ์น˜ ์ •๊ทœํ™”๋ฅผ ์‚ฌ์šฉํ•˜๋Š” ๊ฒฝ์šฐ, ๋ฐฐ์น˜ ์ •๊ทœํ™” ๊ณ„์ธต(batch normalization layer)์„ ์ƒ์„ฑํ•œ๋‹ค.

  • Adam ์ตœ์ ํ™” ์•Œ๊ณ ๋ฆฌ์ฆ˜์„ ์‚ฌ์šฉํ•˜๋Š” ๊ฒฝ์šฐ, Adam์— ํ•„์š”ํ•œ ๋ณ€์ˆ˜๋“ค์„ ์ดˆ๊ธฐํ™”ํ•œ๋‹ค.

  • ๊ฐ€์ค‘์น˜(weight), ํŽธํ–ฅ(bias) ๋“ฑ์˜ ๋ณ€์ˆ˜๋“ค์„ ์ดˆ๊ธฐํ™”ํ•œ๋‹ค.

์„ค๋ช…:

  • ์ž…๋ ฅ๊ฐ’๊ณผ ์ถœ๋ ฅ๊ฐ’ ์‚ฌ์ด์˜ fully connected layer๋ฅผ ์ƒ์„ฑํ•˜๋Š” ํ•จ์ˆ˜์ด๋‹ค.

  • layer ๊ตฌ์กฐ์ฒด๋ฅผ ์„ ์–ธํ•˜๊ณ , ํ•„์š”ํ•œ ๋ณ€์ˆ˜๋“ค์„ ์ดˆ๊ธฐํ™”ํ•œ ํ›„ ๋ฐ˜ํ™˜ํ•œ๋‹ค.

  • layer ๊ตฌ์กฐ์ฒด์˜ fields:

    • type: ๋ ˆ์ด์–ด์˜ ํƒ€์ž…์„ ๋‚˜ํƒ€๋‚ด๋Š” ์—ด๊ฑฐํ˜•(enum) ๋ณ€์ˆ˜

    • inputs: ์ž…๋ ฅ ํฌ๊ธฐ

    • outputs: ์ถœ๋ ฅ ํฌ๊ธฐ

    • batch: ๋ฐฐ์น˜ ํฌ๊ธฐ

    • batch_normalize: ๋ฐฐ์น˜ ์ •๊ทœํ™” ์‚ฌ์šฉ ์—ฌ๋ถ€

    • h, w, c: ๋ ˆ์ด์–ด์˜ ๋†’์ด, ๋„ˆ๋น„, ์ฑ„๋„ ์ˆ˜

    • out_h, out_w, out_c: ์ถœ๋ ฅ ๋ ˆ์ด์–ด์˜ ๋†’์ด, ๋„ˆ๋น„, ์ฑ„๋„ ์ˆ˜

    • output: ๋ ˆ์ด์–ด์˜ ์ถœ๋ ฅ๊ฐ’

    • delta: ๋ ˆ์ด์–ด์˜ ์—ญ์ „ํŒŒ ์‹œ ๊ทธ๋ ˆ์ด๋””์–ธํŠธ ๊ฐ’

    • weights: ๊ฐ€์ค‘์น˜

    • biases: ํŽธํ–ฅ

    • weight_updates: ๊ฐ€์ค‘์น˜ ๊ฐฑ์‹  ๊ฐ’

    • bias_updates: ํŽธํ–ฅ ๊ฐฑ์‹  ๊ฐ’

    • forward: ๋ ˆ์ด์–ด์˜ ์ˆœ์ „ํŒŒ ํ•จ์ˆ˜ ํฌ์ธํ„ฐ

    • backward: ๋ ˆ์ด์–ด์˜ ์—ญ์ „ํŒŒ ํ•จ์ˆ˜ ํฌ์ธํ„ฐ

    • update: ๋ ˆ์ด์–ด์˜ ๊ฐ€์ค‘์น˜์™€ ํŽธํ–ฅ์„ ๊ฐฑ์‹ ํ•˜๋Š” ํ•จ์ˆ˜ ํฌ์ธํ„ฐ

    • scales: ๋ฐฐ์น˜ ์ •๊ทœํ™” ๊ณ„์ธต์˜ ์Šค์ผ€์ผ(scale) ๊ฐ’

    • scale_updates: ๋ฐฐ์น˜ ์ •๊ทœํ™” ๊ณ„์ธต์˜ ์Šค์ผ€์ผ ๊ฐฑ์‹  ๊ฐ’

    • mean: ๋ฐฐ์น˜ ์ •๊ทœํ™” ๊ณ„์ธต์˜ ํ‰๊ท (mean) ๊ฐ’

    • mean_delta: ๋ฐฐ์น˜ ์ •๊ทœํ™” ๊ณ„์ธต์˜ ํ‰๊ท  ๊ฐฑ์‹  ๊ฐ’

    • variance: ๋ฐฐ์น˜ ์ •๊ทœํ™” ๊ณ„์ธต์˜ ๋ถ„์‚ฐ(variance) ๊ฐ’

    • variance_delta: ๋ฐฐ์น˜ ์ •๊ทœํ™” ๊ณ„์ธต์˜ ๋ถ„์‚ฐ ๊ฐฑ์‹  ๊ฐ’

    • rolling_mean: ๋ฐฐ์น˜ ์ •๊ทœํ™” ๊ณ„์ธต์˜ ์ด๋™ ํ‰๊ท  ๊ฐ’

    • rolling_variance: ๋ฐฐ์น˜ ์ •๊ทœํ™” ๊ณ„์ธต์˜ ์ด๋™ ๋ถ„

Last updated

Was this helpful?