connected_layer

Fully Connected Layer ėž€?

ģ“ģ „ Layerģ˜ ėŖØė“  ė…øė“œź°€ ė‹¤ģŒ Layerģ˜ ėŖØė“  ė…øė“œģ— 각각 ķ•˜ė‚˜ģ”© ģ—°ź²°ė˜ģ–“ģžˆėŠ” Layer ģž…ė‹ˆė‹¤.

  • ź°€ģž„ źø°ė³øģ ģø Layer

  • 1차원 ė°°ģ—“ė”œė§Œ ģ“ė£Øģ–“ģ ø ģžˆģŠµė‹ˆė‹¤.

ģ“ķ•“ė„¼ ė•źø°ģœ„ķ•“ 그림으딜 ģ‚“ķŽ“ė³“ź² ģŠµė‹ˆė‹¤.

크게 ė³µģž”ķ•˜ģ§€ ģ•Šź³  ė‹Øģˆœķ•œ ģ—°ģ‚°ģœ¼ė”œė§Œ ģ“ė£Øģ–“ģ ø ģžˆģŠµė‹ˆė‹¤.

Fully Connected Layer ģ—­ģ „ķŒŒėŠ” ģ‰½ź²Œ ķ‘œķ˜„ķ•˜ėŠ” 경우 ģ•„ėž˜ 그림과 ź°™ģŠµė‹ˆė‹¤.

outputģ„ ź³„ģ‚°ķ•˜źø° ģœ„ķ•“ģ„œ ź°ģžģ˜ id넼 가지고 ģžˆėŠ” weightź°€ ģ‚¬ģš©ėœ ź³³ģ„ ė³“ģ‹œė©“ ģ“ķ•“ķ•˜źø° ģ‰½ģŠµė‹ˆė‹¤. 예넼 ė“¤ģ–“ģ„œ w11w_{11}ģ€ h11h_{11}넼 ģ—°ģ‚°ķ•˜ėŠ”ė°ė§Œ ģ‚¬ģš©ė˜ģ—ˆźø° ė•Œė¬øģ— 핓당 ź°’ė§Œ ģ‚¬ģš©ķ•©ė‹ˆė‹¤.


forward_connected_layer

void forward_connected_layer(layer l, network net)
{
    fill_cpu(l.outputs*l.batch, 0, l.output, 1);
    int m = l.batch;
    int k = l.inputs;
    int n = l.outputs;
    float *a = net.input;
    float *b = l.weights;
    float *c = l.output;
    gemm(0,1,m,n,k,1,a,k,b,k,1,c,n);
    if(l.batch_normalize){
        forward_batchnorm_layer(l, net);
    } else {
        add_bias(l.output, l.biases, l.batch, l.outputs, 1);
    }
    activate_array(l.output, l.outputs*l.batch, l.activation);
}

ķ•Øģˆ˜ ģ“ė¦„: forward_connected_layer

ģž…ė „:

  • layer l: ģ—°ź²° ģøµ(layer) 구씰첓

  • network net: ė„¤ķŠøģ›Œķ¬(network) 구씰첓

ė™ģž‘:

  • l.output ė°°ģ—“ģ„ 0으딜 채움

  • 행렬 ź³± ģ—°ģ‚°(GEMM)ģ„ ģˆ˜ķ–‰ķ•˜ģ—¬ l.output ė°°ģ—“ģ„ 새딜욓 ź°’ģœ¼ė”œ ģ—…ė°ģ“ķŠø 함

  • 배치 ģ •ź·œķ™”(batch normalization)ź°€ ķ™œģ„±ķ™”ė˜ģ–“ ģžˆģœ¼ė©“, forward_batchnorm_layer ķ•Øģˆ˜ė„¼ ķ˜øģ¶œķ•˜ģ—¬ l.output ė°°ģ—“ģ„ ģ—…ė°ģ“ķŠø 함

  • 배치 ģ •ź·œķ™”ź°€ ė¹„ķ™œģ„±ķ™”ė˜ģ–“ ģžˆģœ¼ė©“, l.output 배엓에 l.biases ź°’ģ„ ė”ķ•Ø

  • l.activation ķ•Øģˆ˜ė„¼ ģ‚¬ģš©ķ•˜ģ—¬ l.output ė°°ģ—“ģ˜ ėŖØė“  ģ›ģ†Œģ— ķ™œģ„±ķ™” ķ•Øģˆ˜ė„¼ ģ ģš©ķ•Ø

설명:

  • forward_connected_layer ķ•Øģˆ˜ėŠ” 완전 ģ—°ź²°(fully connected) ģøµģ˜ ģˆœģ „ķŒŒ(forward propagation) ģ—°ģ‚°ģ„ ģˆ˜ķ–‰ķ•˜ėŠ” ķ•Øģˆ˜ģž…ė‹ˆė‹¤.

  • fill_cpu ķ•Øģˆ˜ė„¼ ģ‚¬ģš©ķ•˜ģ—¬ l.output ė°°ģ—“ģ„ 0으딜 ģ“ˆźø°ķ™”ķ•©ė‹ˆė‹¤.

  • GEMM ķ•Øģˆ˜ė„¼ ģ‚¬ģš©ķ•˜ģ—¬ ģž…ė „(input) ė°ģ“ķ„°ģ™€ ź°€ģ¤‘ģ¹˜(weights)넼 ź³±ķ•˜ģ—¬ l.output ė°°ģ—“ģ„ 새딜욓 ź°’ģœ¼ė”œ ģ—…ė°ģ“ķŠøķ•©ė‹ˆė‹¤.

  • 배치 ģ •ź·œķ™”ź°€ ķ™œģ„±ķ™”ė˜ģ–“ ģžˆģœ¼ė©“, forward_batchnorm_layer ķ•Øģˆ˜ė„¼ ķ˜øģ¶œķ•˜ģ—¬ l.output ė°°ģ—“ģ„ ģ—…ė°ģ“ķŠøķ•©ė‹ˆė‹¤. 배치 ģ •ź·œķ™”ź°€ ė¹„ķ™œģ„±ķ™”ė˜ģ–“ ģžˆģœ¼ė©“, l.output 배엓에 l.biases ź°’ģ„ ė”ķ•©ė‹ˆė‹¤.

  • ė§ˆģ§€ė§‰ģœ¼ė”œ, activate_array ķ•Øģˆ˜ė„¼ ģ‚¬ģš©ķ•˜ģ—¬ l.activation ķ•Øģˆ˜ė„¼ ģ ģš©ķ•˜ģ—¬ l.output ė°°ģ—“ģ˜ ėŖØė“  ģ›ģ†Œģ— ķ™œģ„±ķ™” ķ•Øģˆ˜ė„¼ ģ ģš©ķ•©ė‹ˆė‹¤.

backward_connected_layer

void backward_connected_layer(layer l, network net)
{
    gradient_array(l.output, l.outputs*l.batch, l.activation, l.delta);

    if(l.batch_normalize){
        backward_batchnorm_layer(l, net);
    } else {
        backward_bias(l.bias_updates, l.delta, l.batch, l.outputs, 1);
    }

    int m = l.outputs;
    int k = l.batch;
    int n = l.inputs;
    float *a = l.delta;
    float *b = net.input;
    float *c = l.weight_updates;
    gemm(1,0,m,n,k,1,a,m,b,n,1,c,n);

    m = l.batch;
    k = l.outputs;
    n = l.inputs;

    a = l.delta;
    b = l.weights;
    c = net.delta;

    if(c) gemm(0,0,m,n,k,1,a,k,b,n,1,c,n);
}

ķ•Øģˆ˜ ģ“ė¦„: backward_connected_layer

ģž…ė „:

  • layer l: backpropagationģ“ ģˆ˜ķ–‰ė  fully connected layer

  • network net: ģ—°ź²°ėœ neural network

ė™ģž‘:

  • lģ—ģ„œ 출렄(l.output)ź³¼ ķ™œģ„±ķ™” ķ•Øģˆ˜(l.activation)넼 ģ‚¬ģš©ķ•˜ģ—¬ delta(l.delta)넼 계산

  • lģ“ batch normalizationģ„ ģ‚¬ģš©ķ•˜ėŠ” 경우, backward_batchnorm_layer ķ•Øģˆ˜ė„¼ ģ‚¬ģš©ķ•˜ģ—¬ backpropagationģ„ ģˆ˜ķ–‰ķ•˜ź³  그렇지 ģ•Šģœ¼ė©“ backward_bias ķ•Øģˆ˜ė„¼ ģ‚¬ģš©ķ•˜ģ—¬ ķŽøķ–„(l.bias_updates)ģ˜ delta넼 계산

  • l.delta와 ģž…ė „(net.input)ģ„ ģ‚¬ģš©ķ•˜ģ—¬ ź°€ģ¤‘ģ¹˜(l.weights)ģ˜ ģ—…ė°ģ“ķŠø(l.weight_updates)넼 ź³„ģ‚°ķ•˜źø° ģœ„ķ•“ GEMM ķ•Øģˆ˜ė„¼ 호출

  • l.delta와 l.weights넼 ģ‚¬ģš©ķ•˜ģ—¬ ģž…ė „(net.delta)ģ˜ delta넼 ź³„ģ‚°ķ•˜źø° ģœ„ķ•“ GEMM ķ•Øģˆ˜ė„¼ 호출

update_connected_layer

void update_connected_layer(layer l, update_args a)
{
    float learning_rate = a.learning_rate*l.learning_rate_scale;
    float momentum = a.momentum;
    float decay = a.decay;
    int batch = a.batch;
    axpy_cpu(l.outputs, learning_rate/batch, l.bias_updates, 1, l.biases, 1);
    scal_cpu(l.outputs, momentum, l.bias_updates, 1);

    if(l.batch_normalize){
        axpy_cpu(l.outputs, learning_rate/batch, l.scale_updates, 1, l.scales, 1);
        scal_cpu(l.outputs, momentum, l.scale_updates, 1);
    }

    axpy_cpu(l.inputs*l.outputs, -decay*batch, l.weights, 1, l.weight_updates, 1);
    axpy_cpu(l.inputs*l.outputs, learning_rate/batch, l.weight_updates, 1, l.weights, 1);
    scal_cpu(l.inputs*l.outputs, momentum, l.weight_updates, 1);
}

ķ•Øģˆ˜ ģ“ė¦„: update_connected_layer

ģž…ė „:

  • layer l: ģ—°ź²° 계층(layer)ģ„ ė‚˜ķƒ€ė‚“ėŠ” 구씰첓

  • update_args a: ėŖØėø ģ—…ė°ģ“ķŠøė„¼ ģœ„ķ•œ ė§¤ź°œė³€ģˆ˜ė„¼ ė‹“ģ€ 구씰첓

ė™ģž‘:

  • ģ—°ź²° ź³„ģøµģ˜ ź°€ģ¤‘ģ¹˜(weights)와 ķŽøķ–„(biases)ģ„ ģ—…ė°ģ“ķŠøķ•˜ėŠ” ķ•Øģˆ˜

  • ė§¤ź°œė³€ģˆ˜ė”œ 주얓진 a에 ė”°ė¼ learning_rate, momentum, decay, batch 크기넼 ģ„¤ģ •ķ•˜ź³ , ģ“ė„¼ ģ‚¬ģš©ķ•˜ģ—¬ ź°€ģ¤‘ģ¹˜ģ™€ ķŽøķ–„ģ„ ģ—…ė°ģ“ķŠøķ•œė‹¤.

설명:

  • l.outputs: ķ˜„ģž¬ ź³„ģøµģ˜ 출렄 개수

  • l.bias_updates: ķŽøķ–„ģ˜ ģ—…ė°ģ“ķŠøģ— ģ‚¬ģš©ė  ź°’ė“¤ģ“ ģ €ģž„ėœ ė°°ģ—“

  • l.biases: ķ˜„ģž¬ ź³„ģøµģ˜ ķŽøķ–„ź°’ģ“ ģ €ģž„ėœ ė°°ģ—“

  • l.scale_updates: 배치 ģ •ź·œķ™”(batch normalization)ź°€ ģ‚¬ģš©ė˜ėŠ” 경우, ģŠ¤ģ¼€ģ¼ģ˜ ģ—…ė°ģ“ķŠøģ— ģ‚¬ģš©ė  ź°’ė“¤ģ“ ģ €ģž„ėœ ė°°ģ—“

  • l.scales: 배치 ģ •ź·œķ™”ź°€ ģ‚¬ģš©ė˜ėŠ” 경우, ģŠ¤ģ¼€ģ¼ź°’ģ“ ģ €ģž„ėœ ė°°ģ—“

  • l.inputs: ģ“ģ „ ź³„ģøµģ˜ 출렄 개수 ķ˜¹ģ€ ģž…ė „ ė°ģ“ķ„°ģ˜ 차원 수

  • l.weights: ź°€ģ¤‘ģ¹˜ź°’ģ“ ģ €ģž„ėœ ė°°ģ—“

  • l.weight_updates: ź°€ģ¤‘ģ¹˜ģ˜ ģ—…ė°ģ“ķŠøģ— ģ‚¬ģš©ė  ź°’ė“¤ģ“ ģ €ģž„ėœ ė°°ģ—“

  • learning_rate: ķ•™ģŠµė„ (learning rate) ź°’

  • momentum: ėŖØė©˜ķ…€(momentum) ź°’

  • decay: ź°€ģ¤‘ģ¹˜ ź°ģ†Œ(weight decay) ź°’

  • batch: ķ˜„ģž¬ 배치(batch)ģ˜ 크기

  • axpy_cpu(): BLAS ė¼ģ“ėøŒėŸ¬ė¦¬ ķ•Øģˆ˜ 중 ķ•˜ė‚˜ė”œ, 범터 ź°„ģ˜ ģ—°ģ‚°ģ„ ģˆ˜ķ–‰ķ•˜ėŠ” ķ•Øģˆ˜

  • scal_cpu(): 범터에 ģŠ¤ģ¹¼ė¼ ź°’ģ„ ź³±ķ•˜ėŠ” ķ•Øģˆ˜

  • 먼저, ķŽøķ–„ ģ—…ė°ģ“ķŠøė„¼ ģˆ˜ķ–‰ķ•œė‹¤. ģ“ ė•Œ, axpy_cpu() ķ•Øģˆ˜ė„¼ ģ‚¬ģš©ķ•˜ģ—¬ ķŽøķ–„ģ˜ ģ—…ė°ģ“ķŠøź°’ģ„ ķŽøķ–„ź°’ģ— ė”ķ•˜ź³ , scal_cpu() ķ•Øģˆ˜ė„¼ ģ‚¬ģš©ķ•˜ģ—¬ ėŖØė©˜ķ…€ ź°’ģœ¼ė”œ 곱핓준다.

  • ė§Œģ•½ 배치 ģ •ź·œķ™”ź°€ ģ‚¬ģš©ė˜ėŠ” 경우, ģŠ¤ģ¼€ģ¼ ģ—…ė°ģ“ķŠøė„ ģˆ˜ķ–‰ķ•œė‹¤. ģ“ ė•Œ, axpy_cpu() ķ•Øģˆ˜ė„¼ ģ‚¬ģš©ķ•˜ģ—¬ ģŠ¤ģ¼€ģ¼ģ˜ ģ—…ė°ģ“ķŠøź°’ģ„ ģŠ¤ģ¼€ģ¼ź°’ģ— ė”ķ•˜ź³ , scal_cpu() ķ•Øģˆ˜ė„¼ ģ‚¬ģš©ķ•˜ģ—¬ ėŖØė©˜ķ…€ ź°’ģœ¼ė”œ 곱핓준다.

  • ź°€ģ¤‘ģ¹˜ ģ—…ė°ģ“ķŠøė„¼ ģˆ˜ķ–‰ķ•œė‹¤. ģ“ ė•Œ, axpy_cpu() ķ•Øģˆ˜ė„¼ ģ‚¬ģš©ķ•˜ģ—¬ ź°€ģ¤‘ģ¹˜ģ˜ ģ—…ė°ģ“ķŠøź°’ģ— ėŒ€ķ•œ ź°’ģ„ 먼저 weight_updates에 ė”ķ•œ ė‹¤ģŒ, ź°€ģ¤‘ģ¹˜ģ— ģ“ ź°’ģ„ ė”ķ•œė‹¤. ģ“ķ›„, scal_cpu() ķ•Øģˆ˜ė„¼ ģ‚¬ģš©ķ•˜ģ—¬ ėŖØė©˜ķ…€ ź°’ģœ¼ė”œ 곱핓준다.

make_connected_layer

layer make_connected_layer(int batch, int inputs, int outputs, ACTIVATION activation, int batch_normalize, int adam)
{
    int i;
    layer l = {0};
    l.learning_rate_scale = 1;
    l.type = CONNECTED;

    l.inputs = inputs;
    l.outputs = outputs;
    l.batch=batch;
    l.batch_normalize = batch_normalize;
    l.h = 1;
    l.w = 1;
    l.c = inputs;
    l.out_h = 1;
    l.out_w = 1;
    l.out_c = outputs;

    l.output = calloc(batch*outputs, sizeof(float));
    l.delta = calloc(batch*outputs, sizeof(float));

    l.weight_updates = calloc(inputs*outputs, sizeof(float));
    l.bias_updates = calloc(outputs, sizeof(float));

    l.weights = calloc(outputs*inputs, sizeof(float));
    l.biases = calloc(outputs, sizeof(float));

    l.forward = forward_connected_layer;
    l.backward = backward_connected_layer;
    l.update = update_connected_layer;

    //float scale = 1./sqrt(inputs);
    float scale = sqrt(2./inputs);
    for(i = 0; i < outputs*inputs; ++i){
        l.weights[i] = scale*rand_uniform(-1, 1);
    }

    for(i = 0; i < outputs; ++i){
        l.biases[i] = 0;
    }

    if(adam){
        l.m = calloc(l.inputs*l.outputs, sizeof(float));
        l.v = calloc(l.inputs*l.outputs, sizeof(float));
        l.bias_m = calloc(l.outputs, sizeof(float));
        l.scale_m = calloc(l.outputs, sizeof(float));
        l.bias_v = calloc(l.outputs, sizeof(float));
        l.scale_v = calloc(l.outputs, sizeof(float));
    }
    if(batch_normalize){
        l.scales = calloc(outputs, sizeof(float));
        l.scale_updates = calloc(outputs, sizeof(float));
        for(i = 0; i < outputs; ++i){
            l.scales[i] = 1;
        }

        l.mean = calloc(outputs, sizeof(float));
        l.mean_delta = calloc(outputs, sizeof(float));
        l.variance = calloc(outputs, sizeof(float));
        l.variance_delta = calloc(outputs, sizeof(float));

        l.rolling_mean = calloc(outputs, sizeof(float));
        l.rolling_variance = calloc(outputs, sizeof(float));

        l.x = calloc(batch*outputs, sizeof(float));
        l.x_norm = calloc(batch*outputs, sizeof(float));
    }

    l.activation = activation;
    fprintf(stderr, "connected                            %4d  ->  %4d\n", inputs, outputs);
    return l;
}

ķ•Øģˆ˜ ģ“ė¦„: make_connected_layer

ģž…ė „:

  • batch: intķ˜•, 배치 크기(batch size)

  • inputs: intķ˜•, ģž…ė „ 크기(input size)

  • outputs: intķ˜•, 출렄 크기(output size)

  • activation: ACTIVATION ģ—“ź±°ķ˜•, ķ™œģ„±ķ™” ķ•Øģˆ˜(activation function)

  • batch_normalize: intķ˜•, 배치 ģ •ź·œķ™” 여부(batch normalization flag)

  • adam: intķ˜•, Adam ģµœģ ķ™” ģ•Œź³ ė¦¬ģ¦˜ ģ‚¬ģš© 여부(Adam optimization flag)

ė™ģž‘:

  • ģž…ė „ź°’ź³¼ ģ¶œė „ź°’ ģ‚¬ģ“ģ˜ fully connected layer넼 ģƒģ„±ķ•œė‹¤.

  • 배치 ģ •ź·œķ™”ė„¼ ģ‚¬ģš©ķ•˜ėŠ” 경우, 배치 ģ •ź·œķ™” 계층(batch normalization layer)ģ„ ģƒģ„±ķ•œė‹¤.

  • Adam ģµœģ ķ™” ģ•Œź³ ė¦¬ģ¦˜ģ„ ģ‚¬ģš©ķ•˜ėŠ” 경우, Adam에 ķ•„ģš”ķ•œ ė³€ģˆ˜ė“¤ģ„ ģ“ˆźø°ķ™”ķ•œė‹¤.

  • ź°€ģ¤‘ģ¹˜(weight), ķŽøķ–„(bias) ė“±ģ˜ ė³€ģˆ˜ė“¤ģ„ ģ“ˆźø°ķ™”ķ•œė‹¤.

설명:

  • ģž…ė „ź°’ź³¼ ģ¶œė „ź°’ ģ‚¬ģ“ģ˜ fully connected layer넼 ģƒģ„±ķ•˜ėŠ” ķ•Øģˆ˜ģ“ė‹¤.

  • layer 구씰첓넼 ģ„ ģ–øķ•˜ź³ , ķ•„ģš”ķ•œ ė³€ģˆ˜ė“¤ģ„ ģ“ˆźø°ķ™”ķ•œ 후 ė°˜ķ™˜ķ•œė‹¤.

  • layer źµ¬ģ”°ģ²“ģ˜ fields:

    • type: ė ˆģ“ģ–“ģ˜ ķƒ€ģž…ģ„ ė‚˜ķƒ€ė‚“ėŠ” ģ—“ź±°ķ˜•(enum) ė³€ģˆ˜

    • inputs: ģž…ė „ 크기

    • outputs: 출렄 크기

    • batch: 배치 크기

    • batch_normalize: 배치 ģ •ź·œķ™” ģ‚¬ģš© 여부

    • h, w, c: ė ˆģ“ģ–“ģ˜ ė†’ģ“, ė„ˆė¹„, 채널 수

    • out_h, out_w, out_c: 출렄 ė ˆģ“ģ–“ģ˜ ė†’ģ“, ė„ˆė¹„, 채널 수

    • output: ė ˆģ“ģ–“ģ˜ ģ¶œė „ź°’

    • delta: ė ˆģ“ģ–“ģ˜ ģ—­ģ „ķŒŒ ģ‹œ ź·øė ˆģ“ė””ģ–øķŠø ź°’

    • weights: ź°€ģ¤‘ģ¹˜

    • biases: ķŽøķ–„

    • weight_updates: ź°€ģ¤‘ģ¹˜ 갱신 ź°’

    • bias_updates: ķŽøķ–„ 갱신 ź°’

    • forward: ė ˆģ“ģ–“ģ˜ ģˆœģ „ķŒŒ ķ•Øģˆ˜ ķ¬ģøķ„°

    • backward: ė ˆģ“ģ–“ģ˜ ģ—­ģ „ķŒŒ ķ•Øģˆ˜ ķ¬ģøķ„°

    • update: ė ˆģ“ģ–“ģ˜ ź°€ģ¤‘ģ¹˜ģ™€ ķŽøķ–„ģ„ ź°±ģ‹ ķ•˜ėŠ” ķ•Øģˆ˜ ķ¬ģøķ„°

    • scales: 배치 ģ •ź·œķ™” ź³„ģøµģ˜ ģŠ¤ģ¼€ģ¼(scale) ź°’

    • scale_updates: 배치 ģ •ź·œķ™” ź³„ģøµģ˜ ģŠ¤ģ¼€ģ¼ 갱신 ź°’

    • mean: 배치 ģ •ź·œķ™” ź³„ģøµģ˜ ķ‰ź· (mean) ź°’

    • mean_delta: 배치 ģ •ź·œķ™” ź³„ģøµģ˜ ķ‰ź·  갱신 ź°’

    • variance: 배치 ģ •ź·œķ™” ź³„ģøµģ˜ ė¶„ģ‚°(variance) ź°’

    • variance_delta: 배치 ģ •ź·œķ™” ź³„ģøµģ˜ ė¶„ģ‚° 갱신 ź°’

    • rolling_mean: 배치 ģ •ź·œķ™” ź³„ģøµģ˜ ģ“ė™ ķ‰ź·  ź°’

    • rolling_variance: 배치 ģ •ź·œķ™” ź³„ģøµģ˜ ģ“ė™ ė¶„

Last updated

Was this helpful?