YOLOv1

Yolov1

Object Detection에 ėŒ€ķ•œ 새딜욓 접근법(one stage object detection) YOLO(You Only Look Once)ź°€ ģ²˜ģŒ ģ œģ•ˆėœ ė…¼ė¬øģž…ė‹ˆė‹¤.

  • ķ†µķ•©ėœ 구씰(bounding box + class probability)넼 가지기 ė•Œė¬øģ— ė¹ ė¦…ė‹ˆė‹¤.

  • 45 FPS / 155 FPS(Fast)

Unified Detection

  • S x S grid (S = 7)

  • B(num of bounding box) = 2

ConfidenceScore:Pr(Object)āˆ—IOUpredtruthConfidence Score : Pr(Object) * IOU^{truth}_{pred}

  • C(num of class) = 20

ConfidenceClassprobability:Pr(Classi∣Object)Confidence Class probability : Pr(Class_i | Object)

  • output tensor : 7 x 7 x (B x 5(x, y, w, h, confidence) + C)

  • test time

Pr(Classi∣Object)āˆ—Pr(Object)āˆ—IOUpredtruth=Pr(Classi)āˆ—IOUpredtruthPr(Class_i | Object) * Pr(Object) * IOU^{truth}_{pred} = Pr(Class_i) * IOU^{truth}_{pred}

Network Design

  • GoogLeNet 기반 ėŖØėø

  • Convolution Layer : 24개, 9개(Fast)

  • Fully Connected Layer : 2개

Loss Function

  • ii : Objectź°€ ģ”“ģž¬ķ•˜ėŠ” Grid Cell

  • jj : Predictor Bounding Box

  • 1i,jobj1^{obj}_{i, j} : Objectź°€ ģ”“ģž¬ķ•˜ėŠ” 경우 grid cellģ˜ predictor bounding box

  • 1i,jnoobj1^{noobj}_{i, j} : Objectź°€ ģ”“ģž¬ķ•˜ģ§€ ģ•ŠėŠ” 경우 grid cellģ˜ predictor bounding box

  • 1iobj1^{obj}_{i} : Objectź°€ ģ”“ģž¬ķ•˜ėŠ” 경우 grid cell

ģ“ėÆøģ§€ ėŒ€ė¶€ė¶„ģ—ėŠ” objectź°€ ģ—†ģ„ ź²ƒģ“ź³  confidenceėŠ” ģ „ė¶€ 0으딜 ģˆ˜ė “ķ•˜ė ¤ź³  ķ•  것 ģž…ė‹ˆė‹¤. 그딜 ģøķ•“ ė°œģƒė˜ėŠ” gradientź°€ ė„ˆė¬“ ģ»¤ģ§€ėŠ” ķ˜„ģƒģ„ 막아주기 ģœ„ķ•“ģ„œ 추가 parameter넼 ģ‚¬ģš©ķ•©ė‹ˆė‹¤.

  • Ī»coord\lambda_{coord} : x, y, w, h lossģ˜ ź· ķ˜•ģ„ ģœ„ķ•œ parameter. (defalut : 5)

  • Ī»noobj\lambda_{noobj} : object lossģ˜ ź· ķ˜•ģ„ ģœ„ķ•œ parameter. (defalut : 0.5)

  1. x, yģ˜ loss넼 źµ¬ķ•©ė‹ˆė‹¤.

  2. w, hģ˜ loss넼 źµ¬ķ•©ė‹ˆė‹¤. (ź°€ė”œ, ģ„øė”œģ˜ ģ œź³±ź·¼ģ„ ģ˜ˆģø”ķ•©ė‹ˆė‹¤.)

  3. confidence scoreģ˜ loss넼 źµ¬ķ•©ė‹ˆė‹¤. (Ci=1C_i = 1)

  4. confidence scoreģ˜ loss넼 źµ¬ķ•©ė‹ˆė‹¤. (Ci=0C_i = 0)

  5. conditional class probabilityģ˜ loss넼 źµ¬ķ•©ė‹ˆė‹¤.

Training

  • ImageNet 1000-class competition dataset으딜 20ź°œģ˜ convolution layer, avg pooling layer, fully connected layer넼 가진 ėŖØėøģ— pretraining ķ•©ė‹ˆė‹¤. ķ•©ė‹ˆė‹¤.

  • randomly initialized weights넼 ź°€ģ§€ėŠ” 4ź°œģ˜ convolution layer와 2ź°œģ˜ fully connected layer넼 ģ¶”ź°€ķ•©ė‹ˆė‹¤.

  • ģ„øė¶€ģ ģø ģ‹œź°ģ •ė³“ė„¼ ģœ„ķ•“ ķ•“ģƒė„ė„¼ 224 x 224ģ—ģ„œ 448 x 448딜 ėŠ˜ė øģŠµė‹ˆė‹¤.

  • bounding boxģ˜ ķ­ź³¼ ė†’ģ“ė„¼ ģ •ź·œķ™”(0 ~ 1) ķ•˜ģ˜€ģŠµė‹ˆė‹¤.

  • ė§ˆģ§€ė§‰ Layer에 linear activation functionģ„ ģ‚¬ģš©ķ•˜ģ˜€ź³  ė‚˜ėØøģ§€ 다넸 layerģ—ėŠ” leaky relu넼 ģ‚¬ģš©ķ•©ė‹ˆė‹¤.

parameters

  • epoch : 135

  • batch : 64

  • momentum : 0.9

  • weight decay : 0.0005

  • learning rate : 0.001 -> 0.01 -> 0.001 -> 0.0001

    • 75 epoch : 0.01

    • 30 epoch : 0.001

    • 30 epoch : 0.0001

  • dropout rate : 0.5

  • data augmentation

    • random scaling

    • HSV ģƒ‰ģƒ ź³µź°„ģ—ģ„œ ģµœėŒ€ 1.5ė°° ź¹Œģ§€ exposureź³¼ saturationģ„ ģž„ģ˜ė”œ ģ”°ģ •ķ•©ė‹ˆė‹¤.

Inference

  • one stageė¼ģ„œ 매우 ė¹ ė¦…ė‹ˆė‹¤.

  • ģ“ėÆøģ§€ė‹¹ 98ź°œģ˜ bounding box와 각 box에 ėŒ€ķ•œ class probability넼 ģ˜ˆģø”ķ•©ė‹ˆė‹¤.

  • 각 object당 ķ•˜ė‚˜ģ˜ bounding box딜 ģ˜ˆģø”ķ•œė‹¤.

  • 큰 objectė‚˜ ģ—¬ėŸ¬ź°œģ˜ ģ…€ģ˜ ķ…Œė‘ė¦¬ģ— ź·¼ģ²˜ģ— ģžˆėŠ” ė¬¼ģ²“ėŠ” ģ˜ˆģø”ķ•˜źø° ģ–“ė µģŠµė‹ˆė‹¤. NMS딜 ķ•“ź²°ķ•  수 ģžˆģ§€ė§Œ R-CNN 만큼 ģ„±ėŠ„ģ— 크게 ģ˜ķ–„ģ„ ėÆøģ¹˜ģ§€ėŠ” ģ•ŠģŠµė‹ˆė‹¤.

Limitation

  • Small Objectź°€ ėŖØģ—¬ ģžˆģœ¼ė©“ ģž˜ ź²€ģ¶œķ•˜ģ§€ ėŖ»ķ•©ė‹ˆė‹¤.

  • Localization Errorź°€ ė†’ģŠµė‹ˆė‹¤.

Benchmark

  • YoloėŠ” ė¹ ė„“ź³  ź°•ė „ķ•©ė‹ˆė‹¤.

  • Yolo ģ“ģ „ģ— ģ‚¬ģš©ėœ real time object detection 볓다 ģ„±ėŠ„ģ“ ģ¢‹ģŠµė‹ˆė‹¤.

  • Yoloź°€ Fast-RCNN 볓다 Localization Errorź°€ 좋지 ģ•ŠģŠµė‹ˆė‹¤.

  • Yoloź°€ Fast-RCNN 볓다 Background Errorź°€ ģ¢‹ģŠµė‹ˆė‹¤.

  • Yolo와 Fast-RCNNģ„ ź²°ķ•©ķ•“ģ„œ ģ‚¬ģš©ķ•˜ė©“ ģ¢‹ģŠµė‹ˆė‹¤.

  • ķ“ėž˜ģŠ¤ ė³„ė”œ ģ •ķ™•ė„ė„¼ ė¹„źµķ•œ ķ‘œ ģž…ė‹ˆė‹¤.

Last updated

Was this helpful?