AI-Dentify: deep learning for proximal caries detection on bitewing x-ray - HUNT4 Oral Health Study

BMC Oral Health

Table 7 Mean average precision (mAP), mean F1-score (mF1), and mean false negative rate (mFNR) evaluation of the models (trained on the fifth fold) and individual annotators on the consensus test set, with an IoU-threshold of 0.3. All metrics are reported as score over the whole test set, and a \(95\%\) confidence interval. The best results among the models and the annotators have been highlighted in bold letter. * annotators who did not complete the individual annotation task (see "Validation protocol")

Model / Annotator	mAP	mF1	mFNR
YOLOv5, 5^th fold	0.647 [0.566, 0.707]	0.548 [0.506, 0.598]	0.149 [0.110, 0.203]
RetinaNet, 5^th fold	0.407 [0.355, 0.458]	0.177 [0.154, 0.202]	0.210 [0.167, 0.262]
EfficientDet D0, 5^th fold	0.360 [0.290, 0.431]	0.522 [0.461, 0.588]	0.484 [0.422, 0.552]
EfficientDet D1, 5^th fold	0.503 [0.421, 0.569]	0.503 [0.421, 0.569]	0.359 [0.306, 0.431]
Annotator 1*	0.284 [0.231, 0.347]	0.495 [0.447, 0.552]	0.480 [0.413, 0.552]
Annotator 2	0.250 [0.247, 0.285]	0.385 [0.346, 0.420]	0.309 [0.251, 0.374]
Annotator 3	0.242 [0.199, 0.320]	0.403 [0.343, 0.470]	0.631 [0.564, 0.686]
Annotator 4	0.299 [0.270, 0.353]	0.450 [0.411, 0.492]	0.237 [0.180, 0.292]
Annotator 5	0.288 [0.244, 0.356]	0.479 [0.423, 0.528]	0.444 [0.376, 0.515]
Annotator 6	0.261 [0.248, 0.301]	0.376 [0.346, 0.410]	0.164 [0.124, 0.217]

ISSN: 1472-6831