YOLOv3
Yolov3
YOLOv2 ģ“ķ ėģØ ė ¼ė¬øģ ģ ģ©ķ“ Object Detectionģ ģ½ģ ė¤ģ ķ“ź²°ķė ¤ė ģ¤ķģ ķ©ėė¤.
ģ ķģ±ģ ėģ§ė§ ģ¬ģ ķ ė¹ ė¦ ėė¤!
SSDė³“ė¤ 3ė°° ė¹ ė„“ģ§ė§ ģ ķėė ėģµėė¤.
RetinaNetź³¼ ģ ķėź° ģ ģ¬ķģ§ė§ ė¹ ė¦ ėė¤.
Bounding Box Prediction
YOLOv2ė Anchor Boxė” Dimension cluster넼 ģ¬ģ©ķ“ģ Bounding Box넼 ģģø”ķ©ėė¤.
넼 ģģø”ķź³ ģ¢ģėØ ė¶ķ° ģģķ“ ė§ķ¼ offsetėź³ bounding boxģ width, heightź° ģø ź²½ģ° ģµģ¢ bounding boxė ģ ėė¤.
L2 loss넼 ģ¬ģ©ķ“ ķģµķź³ YOLOv3ė ģ“ ģģ ė¤ģ§ģ“ģ ė°ė” ģ ź³ģ°ķėė” ķ©ėė¤. ģ¦, ground truth넼 ė” ė§ė ė¤ė ģ미 ģ ėė¤.
ė§ģ½ bounding boxź° ė¤ė„ø boxė³“ė¤ ground truthģ ė§ģ“ ź²¹ģ¹ė ź²½ģ° IOUė 1ģ“ģ“ģ¼ ķ©ėė¤. ė§ģ½ IOUź° ģ ģ¼ ģ¢ģ ź²ģ“ ģėė©“ģ ģź³ź° ģ“ģģ IOU넼 ź°ģ§ė¤ė©“ ģģø”ģ 묓ģķ©ėė¤. ź° ground truthģ 1ź°ģ bounding boxė§ ķ ė¹ķ©ėė¤.
IOU ģź³ź°ģ 0.5ģ ėė¤.
bounding boxź° ground truthģ ķ¬ķØėģ§ ģė ź²½ģ° classification lossė ģź³ objectness lossė§ ź°ģ§ėė¤.
Class Prediction
ź° bounding boxė multi-label classificationģ ģ¬ģ©ķ©ėė¤.
multi-label classificationģ softmaxź° ģ¢ģ§ ģźø° ė문ģ binary cross-entropy loss넼 ģ¬ģ©ķ©ėė¤.
Predictions Across Scales
YOLOv3ė ģė” ė¤ė„ø ģ¤ģ¼ģ¼ģ ź°ģ§ė 3ź°ģ§ box넼 ģģø”ķ©ėė¤.
feature pyramid networksģ ģ ģ¬ķ ė°©ģģ¼ė” ķ¹ģ§ģ ģ¶ģ¶ķ©ėė¤.
ėŖź°ģ convolutional layerź° ģ¶ź°ėź³ ģ¶ė „ģ 3-d tensor ģ ėė¤.
N x N x [3 * (4(bounding box offsets) + 1(objectness) + 80(class))]
ģ“ģ ģ 2ė²ģ§ø layerģģ feature mapģ 2ė°° Upsampling ķ©ėė¤.
ģ“źø°ė¶ķ° feature mapģ ź°ģ øģ Upsamplingė feature mapź³¼ concatķ©ėė¤. ģ“ ė°©ė²ģ ģ¬ģ©ķė©“ ģ미ģė ģ 볓(ģ“ģ layer)ģ ģøė¶ķ ė ģ 볓(ģ“źø° layer)넼 ģ»ģ ģ ģģµėė¤.
ź²°ķ© ė feature mapģ ģ²ė¦¬ķźø° ģķ“ģ convolutional layer넼 ģ¶ź°ķ©ėė¤.
ģµģ¢ scaleģ box넼 ģģø”ķźø° ģķ“ģ ź°ģ ėģģøģ ķė²ė ģķķ©ėė¤. ė°ė¼ģ 3ė²ģ§ø scaleģ ģģø”ģ ėŖØė ģ“ģ layerģ ģ“źø°ģ ģøė¶ķėź³ ģ미ģė ģ 볓넼 ķģ©ķ©ėė¤.
k-means넼 ķµķ“ anchor box넼 clusteringķź³ 9ź°ģ clusterģ 3ź°ģ scale넼 ģģė” ģ ķķ“ cluster넼 ź· ė±ķź² ėėėė¤.
COCOģ ź²½ģ°
(10 Ć 13), (16 Ć 30), (33 Ć 23), (30 Ć 61), (62 Ć 45), (59 Ć 119), (116 Ć 90) , (156 Ć 198), (373 Ć 326)
ģ ėė¤.
Feature Extractor
ķ¹ģ§ ģ¶ģ¶ģ ģķ DarkNet53ģ ģ ģķ©ėė¤.
DarkNet53ģ ė¤ė„ø ėŖØėøź³¼ ė¹źµķ©ėė¤. ė°ģ“ķ°ģ ģ ImageNetģ ģ¬ģ©ķ©ėė¤.
Training
miningź°ģ ė°©ė²ģ ģ¬ģ©ķģ§ ģģµėė¤.
multi-scale training, data augmentation, batch normalization ė± ė§ģ ė°©ė²ģ ģ¬ģ©ķ©ėė¤.
How We Do
COCOģ ģ“ģķ mAP넼 ģ¬ģ©ķė©“ SSD ė³ķź³¼ ėģ¼ķģ§ė§ 3ė°°ė ė¹ ė¦ ėė¤. ķģ§ė§ ģ“ė¬ķ ģø”ģ ė²ģ¼ė” RetinaNetź³¼ ź°ģ ėŖØėøė³“ė¤ ģ½ź° ģ±ė„ģ“ ģ¢ģ§ ģģµėė¤.
IOU = 0.5
ģģ AP50넼 ė³¼ė YOLOv3ė ź°ė „ķ©ėė¤.IOUģ thresholdź° ģ¦ź°ķė©“ Objectģ Box넼 ģė²½ķ ģ ė ¬ķėė° ģ“ė ¤ģģ ź²Ŗģ“ ģ±ė„ģ“ źøź²©ķ ėØģ“ģ§ėė¤.
ģ“ģ ģ YOLOģ ģ½ģ ģø ģģ 물첓넼 ź²ģ¶ķė ź²ģ“ ķØģ¬ ģ¢ģģ”ģµėė¤.
Things We Tried That Didn't Work
anchor boxģ x, y offsetģ ģģø” : linear activationģ ģ¬ģ©ķ“ģ boxģ width, heightģ ė°°ģė”ģØ anchor boxģ x, y넼 ģģø”ģ ģėķģ§ė§ ģ¢ģ§ ģģģµėė¤.
Linear x, yt predictions instead of logistic : logistic activationėģ linear activationģ ģ¬ģ©ķ“ x, yģ offsetģ ģģø”ķė ¤ ķģ§ė§ ėŖ ķ¬ģøķø ģ ėģ mAP ģ±ė„ģ ė®ģ¶„ėė¤.
Focal Loss : mAPź° 2% ėØģ“ģ§ėė¤. ģ“미 objectness, classificationģ“ ģėģźø° ė문ģ“ė¼ź³ ķģ§ė§ ķģ ķ ģ ģė¤ź³ ķ©ėė¤.
Dual IOU thresholds and truth assignment : Faster RCNNģģ ź³ ģė ė°©ė²ģ¼ė” ėź°ģ IOUź°ģ ģ¬ģ©ķ©ėė¤. ģģø” IOUź° 0.7ģ“ģģ“ė©“ źøģ ģ ģø sampleģ“ź³ 0.3ģ“ķė©“ ė¶ģ ģ ģø sampleģ ėė¤. ź²°ź³¼ė ģ¢ģ§ ģģģµėė¤.
What This All Means
YOLOv3ė ģ ķķź³ ė¹ ė¦ ėė¤. ķģ§ė§ COCO metric(0.5 ~ 0.95ź¹ģ§ ģ”°źøģ© ė리멓ģ ķź°ķė ė°©ė²)ģ¼ė”ė ģ¢ģ§ ģģ§ė§ AP50 metricģ ģ¢ģµėė¤. Russakovsky et al.ģ ģ¬ėė¤ģź² IOUź° 0.3, 0.5ģø bounding box넼 구ė¶ķėė” ķź² ķģ§ė§ 구ė¶ģ ģ ėŖ»ķė¤ź³ ķ©ėė¤. ź·ø ė§ģ ģ¦ģØ COCO metricģ²ė¼ ģøė°ķ ķź° ė°©ė²ģ“ ģ ė§ ģ¢ģģ§ģ ėķ ģź²¬ģ ė§ķ©ėė¤.
Rebutterė YOLO benchmarkingģ ģģ¹, COCO metricģ“ ģ½ķ ģ“ģ 넼 ė ģøė°ķź² ķģ“ė“ģ§ė§ ģ§ģ ģ ģ¼ė” ė¤ė£Øģ§ ģź² ģµėė¤.
Last updated
Was this helpful?