connected_layer
Fully Connected Layer ė?
ģ“ģ Layerģ ėŖØė ė øėź° ė¤ģ Layerģ ėŖØė ė øėģ ź°ź° ķėģ© ģ°ź²°ėģ“ģė Layer ģ ėė¤.
ź°ģ„ źø°ė³øģ ģø Layer
1ģ°Øģ ė°°ģ“ė”ė§ ģ“루ģ“ģ ø ģģµėė¤.
ģ“ķ“넼 ėźø°ģķ“ ź·øė¦¼ģ¼ė” ģ“ķ“ė³“ź² ģµėė¤.
ķ¬ź² ė³µģ”ķģ§ ģź³ ėØģķ ģ°ģ°ģ¼ė”ė§ ģ“루ģ“ģ ø ģģµėė¤.
Fully Connected Layer ģģ ķė ģ½ź² ķķķė ź²½ģ° ģė ź·øė¦¼ź³¼ ź°ģµėė¤.
outputģ ź³ģ°ķźø° ģķ“ģ ź°ģģ id넼 ź°ģ§ź³ ģė weightź° ģ¬ģ©ė ź³³ģ 볓ģė©“ ģ“ķ“ķźø° ģ½ģµėė¤. ģ넼 ė¤ģ“ģ ģ 넼 ģ°ģ°ķėė°ė§ ģ¬ģ©ėģźø° ė문ģ ķ“ė¹ ź°ė§ ģ¬ģ©ķ©ėė¤.
forward_connected_layer
void forward_connected_layer(layer l, network net)
{
fill_cpu(l.outputs*l.batch, 0, l.output, 1);
int m = l.batch;
int k = l.inputs;
int n = l.outputs;
float *a = net.input;
float *b = l.weights;
float *c = l.output;
gemm(0,1,m,n,k,1,a,k,b,k,1,c,n);
if(l.batch_normalize){
forward_batchnorm_layer(l, net);
} else {
add_bias(l.output, l.biases, l.batch, l.outputs, 1);
}
activate_array(l.output, l.outputs*l.batch, l.activation);
}
ķØģ ģ“ė¦: forward_connected_layer
ģ ė „:
layer l: ģ°ź²° ģøµ(layer) 구씰첓
network net: ė¤ķøģķ¬(network) 구씰첓
ėģ:
l.output ė°°ģ“ģ 0ģ¼ė” ģ±ģ
ķė ¬ ź³± ģ°ģ°(GEMM)ģ ģķķģ¬ l.output ė°°ģ“ģ ģė”ģ“ ź°ģ¼ė” ģ ė°ģ“ķø ķØ
ė°°ģ¹ ģ ź·ķ(batch normalization)ź° ķģ±ķėģ“ ģģ¼ė©“, forward_batchnorm_layer ķØģ넼 ķøģ¶ķģ¬ l.output ė°°ģ“ģ ģ ė°ģ“ķø ķØ
ė°°ģ¹ ģ ź·ķź° ė¹ķģ±ķėģ“ ģģ¼ė©“, l.output ė°°ģ“ģ l.biases ź°ģ ėķØ
l.activation ķØģ넼 ģ¬ģ©ķģ¬ l.output ė°°ģ“ģ ėŖØė ģģģ ķģ±ķ ķØģ넼 ģ ģ©ķØ
ģ¤ėŖ :
forward_connected_layer ķØģė ģģ ģ°ź²°(fully connected) ģøµģ ģģ ķ(forward propagation) ģ°ģ°ģ ģķķė ķØģģ ėė¤.
fill_cpu ķØģ넼 ģ¬ģ©ķģ¬ l.output ė°°ģ“ģ 0ģ¼ė” ģ“źø°ķķ©ėė¤.
GEMM ķØģ넼 ģ¬ģ©ķģ¬ ģ ė „(input) ė°ģ“ķ°ģ ź°ģ¤ģ¹(weights)넼 ź³±ķģ¬ l.output ė°°ģ“ģ ģė”ģ“ ź°ģ¼ė” ģ ė°ģ“ķøķ©ėė¤.
ė°°ģ¹ ģ ź·ķź° ķģ±ķėģ“ ģģ¼ė©“, forward_batchnorm_layer ķØģ넼 ķøģ¶ķģ¬ l.output ė°°ģ“ģ ģ ė°ģ“ķøķ©ėė¤. ė°°ģ¹ ģ ź·ķź° ė¹ķģ±ķėģ“ ģģ¼ė©“, l.output ė°°ģ“ģ l.biases ź°ģ ėķ©ėė¤.
ė§ģ§ė§ģ¼ė”, activate_array ķØģ넼 ģ¬ģ©ķģ¬ l.activation ķØģ넼 ģ ģ©ķģ¬ l.output ė°°ģ“ģ ėŖØė ģģģ ķģ±ķ ķØģ넼 ģ ģ©ķ©ėė¤.
backward_connected_layer
void backward_connected_layer(layer l, network net)
{
gradient_array(l.output, l.outputs*l.batch, l.activation, l.delta);
if(l.batch_normalize){
backward_batchnorm_layer(l, net);
} else {
backward_bias(l.bias_updates, l.delta, l.batch, l.outputs, 1);
}
int m = l.outputs;
int k = l.batch;
int n = l.inputs;
float *a = l.delta;
float *b = net.input;
float *c = l.weight_updates;
gemm(1,0,m,n,k,1,a,m,b,n,1,c,n);
m = l.batch;
k = l.outputs;
n = l.inputs;
a = l.delta;
b = l.weights;
c = net.delta;
if(c) gemm(0,0,m,n,k,1,a,k,b,n,1,c,n);
}
ķØģ ģ“ė¦: backward_connected_layer
ģ ė „:
layer l: backpropagationģ“ ģķė fully connected layer
network net: ģ°ź²°ė neural network
ėģ:
lģģ ģ¶ė „(l.output)ź³¼ ķģ±ķ ķØģ(l.activation)넼 ģ¬ģ©ķģ¬ delta(l.delta)넼 ź³ģ°
lģ“ batch normalizationģ ģ¬ģ©ķė ź²½ģ°, backward_batchnorm_layer ķØģ넼 ģ¬ģ©ķģ¬ backpropagationģ ģķķź³ ź·øė ģ§ ģģ¼ė©“ backward_bias ķØģ넼 ģ¬ģ©ķģ¬ ķøķ„(l.bias_updates)ģ delta넼 ź³ģ°
l.deltaģ ģ ė „(net.input)ģ ģ¬ģ©ķģ¬ ź°ģ¤ģ¹(l.weights)ģ ģ ė°ģ“ķø(l.weight_updates)넼 ź³ģ°ķźø° ģķ“ GEMM ķØģ넼 ķøģ¶
l.deltaģ l.weights넼 ģ¬ģ©ķģ¬ ģ ė „(net.delta)ģ delta넼 ź³ģ°ķźø° ģķ“ GEMM ķØģ넼 ķøģ¶
update_connected_layer
void update_connected_layer(layer l, update_args a)
{
float learning_rate = a.learning_rate*l.learning_rate_scale;
float momentum = a.momentum;
float decay = a.decay;
int batch = a.batch;
axpy_cpu(l.outputs, learning_rate/batch, l.bias_updates, 1, l.biases, 1);
scal_cpu(l.outputs, momentum, l.bias_updates, 1);
if(l.batch_normalize){
axpy_cpu(l.outputs, learning_rate/batch, l.scale_updates, 1, l.scales, 1);
scal_cpu(l.outputs, momentum, l.scale_updates, 1);
}
axpy_cpu(l.inputs*l.outputs, -decay*batch, l.weights, 1, l.weight_updates, 1);
axpy_cpu(l.inputs*l.outputs, learning_rate/batch, l.weight_updates, 1, l.weights, 1);
scal_cpu(l.inputs*l.outputs, momentum, l.weight_updates, 1);
}
ķØģ ģ“ė¦: update_connected_layer
ģ ė „:
layer l: ģ°ź²° ź³ģøµ(layer)ģ ėķė“ė źµ¬ģ”°ģ²“
update_args a: ėŖØėø ģ ė°ģ“ķøė„¼ ģķ 매ź°ė³ģ넼 ė“ģ źµ¬ģ”°ģ²“
ėģ:
ģ°ź²° ź³ģøµģ ź°ģ¤ģ¹(weights)ģ ķøķ„(biases)ģ ģ ė°ģ“ķøķė ķØģ
매ź°ė³ģė” ģ£¼ģ“ģ§ aģ ė°ė¼ learning_rate, momentum, decay, batch ķ¬źø°ė„¼ ģ¤ģ ķź³ , ģ“넼 ģ¬ģ©ķģ¬ ź°ģ¤ģ¹ģ ķøķ„ģ ģ ė°ģ“ķøķė¤.
ģ¤ėŖ :
l.outputs: ķģ¬ ź³ģøµģ ģ¶ė „ ź°ģ
l.bias_updates: ķøķ„ģ ģ ė°ģ“ķøģ ģ¬ģ©ė ź°ė¤ģ“ ģ ģ„ė ė°°ģ“
l.biases: ķģ¬ ź³ģøµģ ķøķ„ź°ģ“ ģ ģ„ė ė°°ģ“
l.scale_updates: ė°°ģ¹ ģ ź·ķ(batch normalization)ź° ģ¬ģ©ėė ź²½ģ°, ģ¤ģ¼ģ¼ģ ģ ė°ģ“ķøģ ģ¬ģ©ė ź°ė¤ģ“ ģ ģ„ė ė°°ģ“
l.scales: ė°°ģ¹ ģ ź·ķź° ģ¬ģ©ėė ź²½ģ°, ģ¤ģ¼ģ¼ź°ģ“ ģ ģ„ė ė°°ģ“
l.inputs: ģ“ģ ź³ģøµģ ģ¶ė „ ź°ģ ķ¹ģ ģ ė „ ė°ģ“ķ°ģ ģ°Øģ ģ
l.weights: ź°ģ¤ģ¹ź°ģ“ ģ ģ„ė ė°°ģ“
l.weight_updates: ź°ģ¤ģ¹ģ ģ ė°ģ“ķøģ ģ¬ģ©ė ź°ė¤ģ“ ģ ģ„ė ė°°ģ“
learning_rate: ķģµė„ (learning rate) ź°
momentum: ėŖØė©ķ (momentum) ź°
decay: ź°ģ¤ģ¹ ź°ģ(weight decay) ź°
batch: ķģ¬ ė°°ģ¹(batch)ģ ķ¬źø°
axpy_cpu(): BLAS ė¼ģ“ėøė¬ė¦¬ ķØģ ģ¤ ķėė”, ė²”ķ° ź°ģ ģ°ģ°ģ ģķķė ķØģ
scal_cpu(): ė²”ķ°ģ ģ¤ģ¹¼ė¼ ź°ģ ź³±ķė ķØģ
먼ģ , ķøķ„ ģ ė°ģ“ķøė„¼ ģķķė¤. ģ“ ė, axpy_cpu() ķØģ넼 ģ¬ģ©ķģ¬ ķøķ„ģ ģ ė°ģ“ķøź°ģ ķøķ„ź°ģ ėķź³ , scal_cpu() ķØģ넼 ģ¬ģ©ķģ¬ ėŖØė©ķ ź°ģ¼ė” ź³±ķ“ģ¤ė¤.
ė§ģ½ ė°°ģ¹ ģ ź·ķź° ģ¬ģ©ėė ź²½ģ°, ģ¤ģ¼ģ¼ ģ ė°ģ“ķøė ģķķė¤. ģ“ ė, axpy_cpu() ķØģ넼 ģ¬ģ©ķģ¬ ģ¤ģ¼ģ¼ģ ģ ė°ģ“ķøź°ģ ģ¤ģ¼ģ¼ź°ģ ėķź³ , scal_cpu() ķØģ넼 ģ¬ģ©ķģ¬ ėŖØė©ķ ź°ģ¼ė” ź³±ķ“ģ¤ė¤.
ź°ģ¤ģ¹ ģ ė°ģ“ķøė„¼ ģķķė¤. ģ“ ė, axpy_cpu() ķØģ넼 ģ¬ģ©ķģ¬ ź°ģ¤ģ¹ģ ģ ė°ģ“ķøź°ģ ėķ ź°ģ 먼ģ weight_updatesģ ėķ ė¤ģ, ź°ģ¤ģ¹ģ ģ“ ź°ģ ėķė¤. ģ“ķ, scal_cpu() ķØģ넼 ģ¬ģ©ķģ¬ ėŖØė©ķ ź°ģ¼ė” ź³±ķ“ģ¤ė¤.
make_connected_layer
layer make_connected_layer(int batch, int inputs, int outputs, ACTIVATION activation, int batch_normalize, int adam)
{
int i;
layer l = {0};
l.learning_rate_scale = 1;
l.type = CONNECTED;
l.inputs = inputs;
l.outputs = outputs;
l.batch=batch;
l.batch_normalize = batch_normalize;
l.h = 1;
l.w = 1;
l.c = inputs;
l.out_h = 1;
l.out_w = 1;
l.out_c = outputs;
l.output = calloc(batch*outputs, sizeof(float));
l.delta = calloc(batch*outputs, sizeof(float));
l.weight_updates = calloc(inputs*outputs, sizeof(float));
l.bias_updates = calloc(outputs, sizeof(float));
l.weights = calloc(outputs*inputs, sizeof(float));
l.biases = calloc(outputs, sizeof(float));
l.forward = forward_connected_layer;
l.backward = backward_connected_layer;
l.update = update_connected_layer;
//float scale = 1./sqrt(inputs);
float scale = sqrt(2./inputs);
for(i = 0; i < outputs*inputs; ++i){
l.weights[i] = scale*rand_uniform(-1, 1);
}
for(i = 0; i < outputs; ++i){
l.biases[i] = 0;
}
if(adam){
l.m = calloc(l.inputs*l.outputs, sizeof(float));
l.v = calloc(l.inputs*l.outputs, sizeof(float));
l.bias_m = calloc(l.outputs, sizeof(float));
l.scale_m = calloc(l.outputs, sizeof(float));
l.bias_v = calloc(l.outputs, sizeof(float));
l.scale_v = calloc(l.outputs, sizeof(float));
}
if(batch_normalize){
l.scales = calloc(outputs, sizeof(float));
l.scale_updates = calloc(outputs, sizeof(float));
for(i = 0; i < outputs; ++i){
l.scales[i] = 1;
}
l.mean = calloc(outputs, sizeof(float));
l.mean_delta = calloc(outputs, sizeof(float));
l.variance = calloc(outputs, sizeof(float));
l.variance_delta = calloc(outputs, sizeof(float));
l.rolling_mean = calloc(outputs, sizeof(float));
l.rolling_variance = calloc(outputs, sizeof(float));
l.x = calloc(batch*outputs, sizeof(float));
l.x_norm = calloc(batch*outputs, sizeof(float));
}
l.activation = activation;
fprintf(stderr, "connected %4d -> %4d\n", inputs, outputs);
return l;
}
ķØģ ģ“ė¦: make_connected_layer
ģ ė „:
batch: intķ, ė°°ģ¹ ķ¬źø°(batch size)
inputs: intķ, ģ ė „ ķ¬źø°(input size)
outputs: intķ, ģ¶ė „ ķ¬źø°(output size)
activation: ACTIVATION ģ“ź±°ķ, ķģ±ķ ķØģ(activation function)
batch_normalize: intķ, ė°°ģ¹ ģ ź·ķ ģ¬ė¶(batch normalization flag)
adam: intķ, Adam ģµģ ķ ģź³ ė¦¬ģ¦ ģ¬ģ© ģ¬ė¶(Adam optimization flag)
ėģ:
ģ ė „ź°ź³¼ ģ¶ė „ź° ģ¬ģ“ģ fully connected layer넼 ģģ±ķė¤.
ė°°ģ¹ ģ ź·ķ넼 ģ¬ģ©ķė ź²½ģ°, ė°°ģ¹ ģ ź·ķ ź³ģøµ(batch normalization layer)ģ ģģ±ķė¤.
Adam ģµģ ķ ģź³ 리ģ¦ģ ģ¬ģ©ķė ź²½ģ°, Adamģ ķģķ ė³ģė¤ģ ģ“źø°ķķė¤.
ź°ģ¤ģ¹(weight), ķøķ„(bias) ė±ģ ė³ģė¤ģ ģ“źø°ķķė¤.
ģ¤ėŖ :
ģ ė „ź°ź³¼ ģ¶ė „ź° ģ¬ģ“ģ fully connected layer넼 ģģ±ķė ķØģģ“ė¤.
layer 구씰첓넼 ģ ģøķź³ , ķģķ ė³ģė¤ģ ģ“źø°ķķ ķ ė°ķķė¤.
layer 구씰첓ģ fields:
type: ė ģ“ģ“ģ ķģ ģ ėķė“ė ģ“ź±°ķ(enum) ė³ģ
inputs: ģ ė „ ķ¬źø°
outputs: ģ¶ė „ ķ¬źø°
batch: ė°°ģ¹ ķ¬źø°
batch_normalize: ė°°ģ¹ ģ ź·ķ ģ¬ģ© ģ¬ė¶
h, w, c: ė ģ“ģ“ģ ėģ“, ėė¹, ģ±ė ģ
out_h, out_w, out_c: ģ¶ė „ ė ģ“ģ“ģ ėģ“, ėė¹, ģ±ė ģ
output: ė ģ“ģ“ģ ģ¶ė „ź°
delta: ė ģ“ģ“ģ ģģ ķ ģ ź·øė ģ“ėģøķø ź°
weights: ź°ģ¤ģ¹
biases: ķøķ„
weight_updates: ź°ģ¤ģ¹ ź°±ģ ź°
bias_updates: ķøķ„ ź°±ģ ź°
forward: ė ģ“ģ“ģ ģģ ķ ķØģ ķ¬ģøķ°
backward: ė ģ“ģ“ģ ģģ ķ ķØģ ķ¬ģøķ°
update: ė ģ“ģ“ģ ź°ģ¤ģ¹ģ ķøķ„ģ ź°±ģ ķė ķØģ ķ¬ģøķ°
scales: ė°°ģ¹ ģ ź·ķ ź³ģøµģ ģ¤ģ¼ģ¼(scale) ź°
scale_updates: ė°°ģ¹ ģ ź·ķ ź³ģøµģ ģ¤ģ¼ģ¼ ź°±ģ ź°
mean: ė°°ģ¹ ģ ź·ķ ź³ģøµģ ķź· (mean) ź°
mean_delta: ė°°ģ¹ ģ ź·ķ ź³ģøµģ ķź· ź°±ģ ź°
variance: ė°°ģ¹ ģ ź·ķ ź³ģøµģ ė¶ģ°(variance) ź°
variance_delta: ė°°ģ¹ ģ ź·ķ ź³ģøµģ ė¶ģ° ź°±ģ ź°
rolling_mean: ė°°ģ¹ ģ ź·ķ ź³ģøµģ ģ“ė ķź· ź°
rolling_variance: ė°°ģ¹ ģ ź·ķ ź³ģøµģ ģ“ė ė¶
Last updated
Was this helpful?