Skip to main content

Table 7 Mean average precision (mAP), mean F1-score (mF1), and mean false negative rate (mFNR) evaluation of the models (trained on the fifth fold) and individual annotators on the consensus test set, with an IoU-threshold of 0.3. All metrics are reported as score over the whole test set, and a \(95\%\) confidence interval. The best results among the models and the annotators have been highlighted in bold letter. * annotators who did not complete the individual annotation task (see "Validation protocol")

From: AI-Dentify: deep learning for proximal caries detection on bitewing x-ray - HUNT4 Oral Health Study

Model / Annotator

mAP

mF1

mFNR

YOLOv5, 5th fold

0.647 [0.566, 0.707]

0.548 [0.506, 0.598]

0.149 [0.110, 0.203]

RetinaNet, 5th fold

0.407 [0.355, 0.458]

0.177 [0.154, 0.202]

0.210 [0.167, 0.262]

EfficientDet D0, 5th fold

0.360 [0.290, 0.431]

0.522 [0.461, 0.588]

0.484 [0.422, 0.552]

EfficientDet D1, 5th fold

0.503 [0.421, 0.569]

0.503 [0.421, 0.569]

0.359 [0.306, 0.431]

Annotator 1*

0.284 [0.231, 0.347]

0.495 [0.447, 0.552]

0.480 [0.413, 0.552]

Annotator 2

0.250 [0.247, 0.285]

0.385 [0.346, 0.420]

0.309 [0.251, 0.374]

Annotator 3

0.242 [0.199, 0.320]

0.403 [0.343, 0.470]

0.631 [0.564, 0.686]

Annotator 4

0.299 [0.270, 0.353]

0.450 [0.411, 0.492]

0.237 [0.180, 0.292]

Annotator 5

0.288 [0.244, 0.356]

0.479 [0.423, 0.528]

0.444 [0.376, 0.515]

Annotator 6

0.261 [0.248, 0.301]

0.376 [0.346, 0.410]

0.164 [0.124, 0.217]