Skip to main content

Deep learning for classifying the stages of periodontitis on dental images: a systematic review and meta-analysis

Abstract

Background

The development of deep learning (DL) algorithms for use in dentistry is an emerging trend. Periodontitis is one of the most prevalent oral diseases, which has a notable impact on the life quality of patients. Therefore, it is crucial to classify periodontitis accurately and efficiently. This systematic review aimed to identify the application of DL for the classification of periodontitis and assess the accuracy of this approach.

Methods

A literature search up to November 2023 was implemented through EMBASE, PubMed, Web of Science, Scopus, and Google Scholar databases. Inclusion and exclusion criteria were used to screen eligible studies, and the quality of the studies was evaluated by the Grading of Recommendations Assessment, Development and Evaluation (GRADE) methodology with the QUADAS-2 (Quality Assessment of Diagnostic Accuracy Studies) tool. Random-effects inverse-variance model was used to perform the meta-analysis of a diagnostic test, with which pooled sensitivity, specificity, positive likelihood ratio (LR), negative LR, and diagnostic odds ratio (DOR) were calculated, and a summary receiver operating characteristic (SROC) plot was constructed.

Results

Thirteen studies were included in the meta-analysis. After excluding an outlier, the pooled sensitivity, specificity, positive LR, negative LR and DOR were 0.88 (95%CI 0.82–0.92), 0.82 (95%CI 0.72–0.89), 4.9 (95%CI 3.2–7.5), 0.15 (95%CI 0.10–0.22) and 33 (95%CI 19–59), respectively. The area under the SROC was 0.92 (95%CI 0.89–0.94).

Conclusions

The accuracy of DL-based classification of periodontitis is high, and this approach could be employed in the future to reduce the workload of dental professionals and enhance the consistency of classification.

Peer Review reports

Background

Since the 1990s, periodontitis has been a global public health burden, and severe periodontitis, with a 10.59% prevalence rate, ranks 6th among 369 assessed diseases and is responsible for 7.09 million disability-adjusted life years (DALYs), according to the 2019 Global Burden of Diseases (GBD) study [1,2,3]. Periodontitis affects local health and systemic conditions, meaning that if periodontitis is properly treated, systematic inflammation will be reduced [4,5,6,7,8]. However, manual classification based on dental images requires a lot of manpower and time. Furthermore, image quality and radiographic interpretation could compromise the accuracy of classification. All these issues could be alleviated by deep learning (DL) methods [9,10,11].

Both DL and machine learning (ML) are included in artificial intelligence (AI). ML aims at self-training algorithms based on existing data and making predictions for new information [12]. DL is a subgroup of ML that mimics the way the human brain works and is based on neural network structures [13]. Recently, DL, especially convolutional neural networks (CNNs), has been widely used in various fields of medical image analysis, such as segmentation, detection, classification of abnormality, and computer-aided diagnosis [14]. CNNs identify visual patterns directly from the raw pixels of an image, which is similar to the way humans observe objects, to learn the intrinsic features or patterns of the image [14]. They are multi-layered, feed-forward, neural networks using backpropagation algorithms, and consist of convolutional, activation, and pooling layers. Currently, CNNs are still considered the most successful method to process medical images [15].

In dentistry, there are four main applications of CNNs: (1) segmentation; (2) detection; (3) classification; and (4) image quality enhancement, which are all based on dental images, including intraoral (periapical radiograph and bite-wing image) and extra-oral (panoramic X-ray and cone-beam computed tomography [CBCT]) X-rays [9, 16]. For instance, Park et al. applied CNNs to segment tooth surfaces for caries diagnosis [17], and Lee et al. proposed a computer-assisted detection system to identify impacted mandibular third molar teeth [18]. Nowadays, there is a growing trend in the utilization of CNNs in periodontitis fields. Jaiswal et al. developed a novel Intelligent Ant Lion-based Convolution Neural Model (IALCNM) to segment affected parts and classify the wear and periodontitis using panoramic photographs [19]. Moreover, Chen et al. developed an ensembled CNN model to predict tooth position and recognize radiographic bone loss (RBL) using periapical and bitewing radiographs [20]. Furthermore, Moran et al. evaluated whether different pre-processing methods affect the result of periodontal bone loss (PBL) classification based on periapical images [21].

Although there are numerous studies conducted in the interdisciplinary of periodontitis and DL, the type of DL architecture employed in periodontitis classification, determination of the most effective model and comparison of performance against oral physicians have not been systematically reported. Therefore, this study aimed to review the studies on the classification of periodontitis by evaluating various dental images using DL methods, to summarise the types of different models employed, and to compare the performance of these models. This could identify the most appropriate model for the classification of periodontitis based on oral photographs in clinical practice. Moreover, we compared the performance of the DL model to the dental professionals which determines the reliability.

Methods

This systematic review and meta-analysis were conducted referring to the guidelines for Preferred Reporting Items for Systematic Reviews and Meta-analyses for Diagnostic Test Accuracy Studies (PRISMA-DTA). The study was registered at the National Institute for Health Research, International Prospective Register of Systematic Reviews (PROSPERO, registration number CRD 42022338627). Additionally, the study protocol was based on the following PIRD elements [22]:

Population

patients’ diagnostic images that illustrate the status of radiographic bone loss (RBL).

Index test

deep learning models for classification of periodontitis based on RBL.

Reference test

expert opinions according to the classification of periodontitis.

Diagnosis of interest

classification of periodontitis.

Data sources

A reviewer (XL) searched publications through EMBASE, PubMed, Web of Science, Scopus and Google Scholar databases up to November 2023 according to strategies set by two reviewers (DZ and XL). Search strategies combined terms including (1) periodontitis or periodontal disease or periodontal status; (2) image or image processing or computer-aided diagnosis or computer-based diagnosis or smart diagnosis; and (3) artificial intelligence or machine learning or deep learning or convolutional neural networks. The detailed search queries for all databases were provided in Supplementary Table 1.

Criteria for considering studies for this review

Studies that matched the following criteria were considered to be included: (1) Study population with a dental image; (2) Diagnosing with DL technology; and (3) English publications with all statuses, including in-press and unpublished studies. The exclusion criteria were: (1) Animal experiment; (2) Without full article; (3) Without statistical data; and (4) Conference proceedings or reviews or books or patents. (Table 1)

Table 1 Inclusion and exclusion criteria for this review

Study selection and data collection

After screening the titles and abstracts of all identified publications, two reviewers (XL and JXX) independently read the full text of all eligible articles and excluded inappropriate articles according to the inclusion/exclusion criteria. Disagreements between the reviewers were solved by discussing until a consensus was reached or by consulting a third reviewer (DZ). The following data were extracted from each publication: study characteristics (first author, publication year, country), study design (data sets, modality of medical images, machine learning algorithms, study factor, and its definition, algorithms application, comparison), primary outcomes, and conclusions.

Quality assessment

The quality of evidence was evaluated by the Grading of Recommendations Assessment, Development and Evaluation (GRADE) on the following domains: study design, limitations (risk of bias), indirectness, inconsistency, imprecision, and publication bias (https://gdt.gradepro.org/) [23]. The quality of evidence was categorized into four levels: high, moderate, low and very low.

Based on the recommendation of the Cochrane Collaboration, the QUADAS-2 (Quality Assessment of Diagnostic Accuracy Studies) tool was used to evaluate the quality of all eligible articles in terms of the risk of bias and applicability [24]. The assessment was conducted by three reviewers (XL, JXX and YJL). When there were disagreements, it was resolved by discussion or by consulting a third reviewer (DZ) to make the final decision. There were four domains for the risk of bias section: patient selection, index test, reference standard, and flow and timing; the first three of these domains formed the applicability section [25].

Statistical analysis

Summarising the quality score to define high-quality studies is not a recommended method [26]. Moreover, the overall estimate may be similar regardless of the quality of the studies, but if only high-quality studies are analyzed, incomplete reporting may arise [27]. Therefore, all articles containing true positive (TP), false positive (FP), true negative (TN) and false negative (FN) data that were either supplied in the articles or could be calculated from the information provided were used to conduct a meta-analysis using Stata 16.0 software (StataCorp LLC, College Station, TX, USA). Spearman correlation analysis was conducted to assess the threshold effect, without which combined sensitivity, specificity, positive likelihood ratio (LR), negative LR and diagnostic odds ratio (DOR) were calculated directly by using the random-effects inverse-variance model. A forest plot of sensitivity and specificity was generated to visually show the differences among the included studies. Statistical heterogeneity was assessed using the Chi-squared–based Q statistic method and I2, and the level of significance was indicated by P < 0.05 and I2 > 50%, respectively. Influence analysis and subgroup analysis based on study factors including article quality (high/unclear risk of bias, low risk of bias), dental image modality (periapical radiograph images, panoramic dental radiographs), model type (single model, two-stage model) were performed to detect the source of heterogeneity. Two meta-regression models with sensitivity and specificity were carried out to investigate whether sample size has an impact on classification outcomes. A summary receiver operating characteristic (SROC) plot—a plot of scattered sensitivity-specificity points of each potentially eligible study—was constructed, and the area under SROC (AUSROC) was computed [24]. In addition, a Fagan nomogram was drawn to describe how DL methods may have helped clinicians increase the probability of an effective classification of periodontitis. Publication bias was investigated by Deeks’ funnel plot asymmetry test.

Results

Study selection

Figure 1 shows the study selection process and describes the reasons for full-text article exclusion. The five databases (EMBASE, PubMed, Web of Science, Scopus and Google Scholar) identified 1546 potentially relevant publications with 279 duplications. After screening the titles and abstracts of the 1267 remaining studies, 49 articles were selected for full-text reading. Based on the inclusion and exclusion criteria, 27 studies were included in this systematic review [20, 21, 28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52].

Fig. 1
figure 1

PRISMA Flow chat of study selection process

Methodological quality

The risk of bias and applicability were assessed using QUADAS-2 for all included articles, and the results were shown in Supplementary Fig. 1 and Supplementary Fig. 2, respectively. Nearly half of the included studies did not have clear information on whether patients were consecutively or randomly enrolled, resulting in 42.9% of the articles (12/27) showing an unclear risk of bias in the patient selection domain [20, 30, 32, 34,35,36, 38, 45, 48, 52, 37, 42]. Two studies were rated as having a high risk of bias, with one [29] designed to be a case-control study with a convenient sample collection and the other [31] using inappropriate exclusion criteria. Approximately one-fourth of the studies did not mention a prespecified threshold before a test, consequently, 22.2% of the articles (6/27) were ranked as having unclear risk of bias in the index test domain [21, 35, 39, 49, 51, 52]. Four studies were unable to accurately diagnose periodontitis based on their reference tests, as these studies attempted to classify healthy cases and periodontitis only using radiographs [21, 28, 42, 49]. The other studies (85.2%, 23/27) were ranked as having a low risk of bias in the reference standard domain [20, 29,30,31,32,33,34,35,36,37,38,39,40,41, 43,44,45,46,47,48, 50,51,52]. As the diagnostic tests are being conducted by DL algorithms, which do not affect the flow and timing, all articles in the present analysis were ranked as low risk. For the applicability section, all studies were ranked at low risk of bias in patient selection, 74.1% of the included studies (20/27) were ranked as low risk of bias in the index test and reference standard [20, 29, 30, 32,33,34, 36,37,38,39,40,41,42,43,44,45,46,47,48, 52]. The study quality assessment results are presented in Supplementary Table 2.

The quality of evidence based on the GRADE analysis can be found in Supplementary Table 3. Results are shown in different subgroups of model type and dental image modality. When one study was ranked as high risk of bias or unclear risk of bias based on QUADAS-2, the subgroup’s limitation was assessed as a high risk of bias. As a result, all subgroups were considered to be at high risk of bias, leading to one level of evidence quality deduction. Two level of evidence quality was downgraded in the single model using periapical radiograph images and two-stage model subgroups due to inconsistency and imprecise data. While one level of evidence quality was reduced in the single model using panoramic dental radiographs. Consequently, the quality of evidence was scored as very low in the single model with periapical radiograph images and the two-stage model and low in the single model with panoramic dental radiograph.

Study characteristics

The characteristics of all included studies are summarised in Table 2. All articles were published within the last five years, and there was a surge in 2021 with twice as many articles published than in 2020, while in 2022, the number of articles published was 1.5 times that of 2021 (Supplementary Fig. 3). Studies originated from 11 countries, most of which were in Asia. Except for one study that never mentioned data splitting [20], all included studies (26/27) split the datasets or used cross-validation, an approach to avoid model overfitting and evaluate the generalization ability of the model. Three studies used an external dataset to evaluate the performance of the algorithms [29, 43, 48]. In addition, three studies used public databases [35,36,37]. In terms of dental image modality, the studies employed periapical radiograph images, panoramic dental radiographs, and CBCT images to classify periodontitis, among which panoramic radiographs were used the most (15/27) [28,29,30, 32, 33, 35, 36, 38, 39, 42, 47,48,49,50,51] and only one study used CBCT [44]. More than two-thirds of articles (19/27) processed images before applying DL techniques by some common approaches, such as augmentation, normalisation and resizing the images [21, 28, 29, 31,32,33,34, 36, 38,39,40, 43,44,45, 47, 48, 50,51,52]. Furthermore, the DL-aided task has changed over time. In 2019 and 2020, the diagnosis of periodontitis was predominantly chosen, whereas the classification of periodontitis stages was selected in 2021 and 2022. Half studies opted diagnosis task and half chose the staging task in 2023. Regarding the algorithms, the studies mainly utilised deep CNNs (DCNN), with one article involving lightweight CNNs (LCNN) [35]. Eleven studies (11/27) used a two-stage design containing a tooth-identification or segmentation stage and a periodontitis-staging step [20, 30,31,32, 35, 36, 38, 42, 44, 47, 51]. Eight (8/27) studies utilised transfer learning [20, 21, 33, 39, 41, 45, 49, 51]. Reference tests were either experts’ direct opinions of periodontitis or their annotation of regions of interest (ROIs) based on different definitions. Sixteen studies (16/27) employed the new criteria proposed in the 2017 World Workshop on the Classification of Periodontal and Peri-Implant Diseases and Conditions [20, 29,30,31,32,33,34, 36,37,38,39,40, 42, 43, 45, 48], while one study (1/27) [41] used the International Workshop for Classification of Periodontal Diseases and Conditions (1999). Three studies (3/27) [28, 47, 52] carried out according to the World Health Organization’s standardized Community Periodontal Index (CPI) and four studies (4/27) [21, 44, 46, 49] roughly defined periodontitis based on the depth of bone resorption; the remaining two studies (2/27) [50, 51] did not mention the classification criteria. All studies compared the diagnostic performance of DL algorithms either with specialists or among different algorithms. More than two-thirds of articles (19/27) reported accuracy, while sensitivity, specificity, recall, precision, F1-score, ROC and AUROC were also reported among included studies.

Table 2 Characteristics of all included studies

Meta-analysis

From the 27 articles selected for the systematic review, 14 were excluded from the subsequent meta-analysis because TP, FN, FP and TN were not reported and could not be calculated. Consequently, 13 studies were included in the meta-analysis [21, 29, 33,34,35, 40, 41, 43, 47, 49,50,51,52]. The correlation analysis showed heterogeneity due to the threshold effect (r = 0.13; P = 0.02). Therefore, instead of directly combining the sensitivity and specificity to demonstrate the overall accuracy, an SROC curve was generated (Supplementary Fig. 4). The AUSROC was 0.94 (95% confidence interval [95%CI] 0.91–0.96). To investigate the source of heterogeneity, we conducted an influence analysis (Supplementary Fig. 5). Supplementary Fig. 5(c) and Supplementary Fig. 5(d) both indicated that the seventh article was an outlier [43], which can affect the stability of the results. When this article was removed, the threshold effect disappeared (r = − 0.45; P = 0.20), and the combined sensitivity, specificity, positive LR, negative LR and DOR were 0.88 (95%CI 0.82–0.92), 0.82 (95%CI 0.72–0.89), 4.9 (95%CI 3.2–7.5), 0.15 (95%CI 0.10–0.22) and 33 (95%CI 19–59), respectively.

Figure 2 illustrates the forest plot of sensitivity and specificity of the DL algorithms for the periodontitis classification. The AUSROC (Fig. 3) was 0.92 (95%CI 0.89–0.94), which implied that the diagnostic test had high accuracy. According to the Fagan nomogram (Supplementary Fig. 6), the prior probability of this diagnostic test was 50%, the positive LR was 6, the posterior probability after a positive test was 85%, and the negative LR was 0.10. The posterior probability after a negative test was 9%. The subgroup analysis results showed that heterogeneity of sensitivity was statistically significant in model type and dental image modality, and heterogeneity of specificity was statistically significant in article quality (Fig. 4). In detail, a single model would get a significantly higher sensitivity than a two-stage model (P < 0.01). Moreover, the modality of dental images may cause heterogeneity of sensitivity (P < 0.01). Diagnosis sensitivity based on periapical images was higher than that on panoramic images. Furthermore, articles scored as high or unclear risk of bias would get a significantly lower specificity than low risk of bias articles (P = 0.03). Both meta-regression results indicate that there is no statistically significant correlation between sample size and sensitivity (P = 0.069), as well as between sample size and specificity (P = 0.252) (Supplementary Fig. 7, Supplementary Fig. 8). The influence analysis demonstrated that the results were stable by removing one study at a time (Fig. 5). Deeks’ funnel plot asymmetry test illustrated no publication bias (t = 0.74, P = 0.48) (Fig. 6).

Fig. 2
figure 2

The forest plot for sensitivity and specificity of deep learning for periodontitis diagnosis

Fig. 3
figure 3

The summary receiver operating characteristic curve of diagnostic accuracy of periodontitis by deep learning excludes the seventh article. SENS, sensitivity; SPEC, specificity; SROC, summary receiver operating characteristic; AUC, area under curve

Fig. 4
figure 4

Subgroup analysis based on article quality, dental image modality and model type

Fig. 5
figure 5

Influence analysis exclude the seventh article

Fig. 6
figure 6

Publication bias of periodontitis diagnosis by deep learning

Discussion

In this systematic review, we compiled and evaluated studies that utilised DL methods to classify periodontitis based on dental images. With the rise of DL technology, an increasing number of articles have been published on the intersection of periodontitis classification and DL, especially in 2022. The overall quality of the included studies was limited, more high-quality studies are urgently needed. In addition, more than half of the included articles reported that the accuracy, sensitivity, and specificity of their algorithms for classifying periodontitis were > 0.8. The SROC curve also showed the high accuracy of the DL methods for classification. The study by Lee et al. [43], which reported the specificity as 1 for distinguishing non-periodontitis individuals, was an outlier in our meta-analysis. Moreover, the Fagan nomogram indicated that when a DL method classifies a positive result, there is a high probability of periodontitis, and if the classification is negative, the probability of periodontitis is low. These findings are further discussed in the following sections.

Characteristics of dental images

There are very few large and high-quality public databases of dental radiographs. Consequently, dental radiographs must be manually labeled, which is time-consuming and needs to be urgently addressed. Random shift augmentation, oversampling, adjusting weights in the loss function, and transfer learning were used to overcome class-imbalanced issues, which detrimentally contributed to DL classification performance [30, 39, 41, 42, 50, 51, 53].

In terms of modalities of dental images, the studies included in our analysis predominantly used periapical images, panoramic images and CBCT images for periodontitis classification. Nine studies detected RBL in periapical radiograph images. Periapical radiograph images capture the teeth and the surrounding alveolar bone, and therefore can fully provide information on RBL. However, the view of this modality is small, with only three to four teeth on a single image [54]. Over half of the studies in our analysis detected RBL in panoramic X-ray images, which show the whole mouth. However, as two-dimensional modalities, both periapical radiograph images and panoramic X-ray images cannot provide three-dimensional information and have problems with geometric distortion and anatomic noise [55]. All these limitations may affect the performance of periodontitis classification. Only one study in our analysis used CBCT and did detect RBL in the resulting images [44]. Although CBCT can provide three-dimensional information, there are still some limitations caused by artifacts, noise and poor soft tissue contrast [56]. Consequently, dental image processing plays a vital role in periodontitis classification.

Processing of dental images

Two aspects should be considered for an accurate periodontitis classification. One is the quality of dental images, and the other is model performance. To deal with image quality problems, the included articles employed super-resolution and noise reduction methods. One study conducted in Brazil reconstructed high-resolution images from low-resolution images by using four conventional interpolation methods (nearest, bilinear, bicubic, Lanczos) and two DL methods (super-resolution CNN and a variation of the super-resolution generative adversarial network) [21]. Two studies used the contrast-limited adaptive histogram equalization technique for image denoising [39, 40]. Besides noise reduction, one study conducted in the USA also introduced a series of processes to precisely draw the contour of bone, tooth, and cemento-enamel junction after model prediction to improve model performance [43]. In addition, a quarter of the studies resized and normalised the images to improve model performance. Furthermore, because obtaining dental images is difficult, almost half of the included articles used data augmentation techniques to increase the number of images [48, 50, 52].

Classification using dental images

Regarding the task of classification using DL models, classical models such as U-Net and YOLO were often utilised in the included studies [57, 58], regardless of the specific diagnosis task chosen. For tasks involving a two-stage design, U-Net was typically used for segmenting ROIs, while YOLO was employed for object detection. U-Net has been proven to quickly and accurately identify targets in medical images and generate high-quality segmentation results [59]. Additionally, the structure of U-Net can be flexibly adjusted according to the specific needs of the task [59]. Various versions of YOLO, from YOLOv3 to YOLOv5, have been utilised based on different study purposes. Feature Pyramid Network (FPN) was also employed for the ROI segmentation stage [60]. FPN fuses multi-layered features and makes predictions at each fused feature layer, thus, it shows significant improvement in small-object detection without considerably increasing computation. Faster region-based CNN (Faster R-CNN) combines a Region Proposal Network (RPN) and a Fast R-CNN that shares full-image convolutional features to overcome the computational problem, which is why Faster R-CNN is popular in periodontitis diagnosis [61]. Mask R-CNN, which is an extension of Faster R-CNN, has also been employed [62]. Danks et al. employed a symmetric hourglass network that can capture every scale information and combine them to make the final predictions [45].

Based on the included publications, transfer learning is an efficient method for training datasets with limited samples, and it can enhance the model training efficiency. In addition, using appropriate regularisation methods can improve model performance.

Strengths and limitations

Strengths

  1. 1)

    The strength of this review is that we systematically summarised and evaluated the studies on DL for periodontitis classification based on dental images. Moreover, we have described the development trend of DL technology in the field of periodontitis.

  2. 2)

    In addition, we used meta-analysis to quantitatively evaluate the threshold effect and heterogeneity of the included articles and analysed the possible sources of heterogeneity in detail.

Limitations

  1. 1)

    DL-based periodontitis classification is an emerging field and most studies conducted thus far have predominantly focused on Asian populations. This limited regional focus has resulted in a constrained sample representation, thereby impacting the external validity of the findings.

  2. 2)

    Except for three articles that utilised publicly available databases, the samples in the other studies were solely derived from hospital settings, thereby lacking representation from community-based data.

  3. 3)

    No study described the demographic information pertaining to the included subjects. Considering that demographic information could potentially influence the severity of periodontitis and consequently contribute to the heterogeneity observed, it is essential to address this aspect in future research.

  4. 4)

    Only three studies incorporated an external dataset to assess the performance of DL-based models. In contrast, all the other studies relied on training and testing datasets derived from the same source, potentially limiting the generalisability of their results.

  5. 5)

    Since the gold standard of periodontitis diagnosis and classification should be clinical attachment loss (CAL), it would lead to underestimation of periodontal status only based on RBL. However, the classification is still important in the clinical practice when the direct evidence (CAL) is not available.

Conclusions

In summary, the accuracy of DL is high for classifying periodontitis based on dental images. DL is an efficient approach to reducing the workload of dentists and the time consumed during clinical practice. Furthermore, the various DL models have their advantages and disadvantages, and the choice of model should be based on the specific task objectives and requirements. Future research should be designed rigorously to reflect the DL truth performance. The optimisation of DL architecture can promote the performance of periodontitis classification with dental images. Moreover, improving dental image quality and performing regularisation can yield higher periodontitis diagnostic accuracy. In addition, data imbalance is an issue that needs to be considered to enhance diagnostic performance.

Data Availability

The datasets used and/or analyzed during the current study are available from the corresponding author on reasonable request.

Abbreviations

DALYs:

Disability-adjusted life years

GBD:

Global Burden of Diseases

ML:

Machine learning

DL:

Deep learning

CNNs:

Convolutional neural networks

CBCT:

Cone-beam computed tomography

PIRD:

P = population, I = index test, R = reference test, D = diagnosis of interest

PRISMA-DTA:

Preferred Reporting Items for Systematic Reviews and Meta-analyses for Diagnostic Test Accuracy Studies

PPV:

Positive predictive values

NPV:

Negative predictive values

ROC:

Receiver operating characteristic curve

AUC:

The area under the curve

AUROC:

The area under the receiving operating characteristic curve

IoU:

Intersection over union

PA:

Pixel accuracy

AP:

Average precision

ARR:

Average recall rate

AI:

Artificial intelligence

QUADAS-2:

Quality Assessment of Diagnostic Accuracy Studies

TP:

True positive

FP:

False positive

TN:

True negative

FN:

False negative

LR:

Likelihood ratio

DOR:

Diagnostic odds ratio

SROC:

Summary receiver operating characteristic

AUSROC:

Area under summary receiver operating characteristic

DCNN:

Deep convolutional neural networks

LCNN:

Lightweight convolutional neural networks

RBL:

Radiographic bone loss

ROIs:

Regions of interest

FPN:

Feature Pyramid Network

Faster R-CNN:

Faster region-based CNN

RPN:

Region Proposal Network

References

  1. GBD 2019 Diseases and Injuries Collaborators. Global burden of 369 Diseases and injuries in 204 countries and territories, 1990–2019: a systematic analysis for the global burden of Disease Study 2019. Lancet (London England). 2020;396(10258):1204–22.

    Article  Google Scholar 

  2. Global Burden of Disease Study 2019. (GBD 2019) results [http://ghdx.healthdata.org/gbd-results-tool].

  3. Luo LS, Luan HH, Jiang JF, Wu L, Li C, Leng WD, Zeng XT. The spatial and temporal trends of severe periodontitis burden in Asia, 1990–2019: a population-based epidemiological study. J Periodontol 2022.

  4. Wu L, Zhang SQ, Zhao L, Ren ZH, Hu CY. Global, regional, and national burden of periodontitis from 1990 to 2019: results from the Global Burden of Disease study 2019. J Periodontology 2021 Sep, 48(9):1165–88.

  5. Hajishengallis G, Chavakis T. Local and systemic mechanisms linking periodontal Disease and inflammatory comorbidities. Nat Rev Immunol. 2021;21(7):426–40.

    Article  PubMed  PubMed Central  Google Scholar 

  6. Mainas G, Ide M, Rizzo M, Magan-Fernandez A, Mesa F, Nibali L. Managing the systemic impact of Periodontitis. Med (Kaunas Lithuania) 2022, 58(5).

  7. Zhao D, Homayounfar M, Zhen Z, Wu MZ, Yu SY, Yiu KH, Vardhanabhuti V, Pelekos G, Jin L, Koohi-Moghadam M. A Multimodal Deep Learning Approach to Predicting systemic Diseases from oral conditions. Diagnostics (Basel Switzerland) 2022, 12(12).

  8. Tonetti MS, Kornman KS, editors. Special Issue: Periodontitis and Systemic Diseases - Proceedings of a workshop jointly held by the European Federation of Periodontology and American Academy of Periodontology. J Clin Periodontol 2013;40(s14):S1-S209.

  9. Putra RH, Doi C, Yoda N, Astuti ER, Sasaki K. Current applications and development of artificial intelligence for digital dental radiography. Dento Maxillo Fac Radiol. 2022;51(1):20210197.

    Article  Google Scholar 

  10. Bruno MA, Walker EA, Abujudeh HH. Understanding and confronting our mistakes: the epidemiology of Error in Radiology and Strategies for Error Reduction. Radiographics: A Review Publication of the Radiological Society of North America Inc. 2015;35(6):1668–76.

    Article  PubMed  Google Scholar 

  11. Akesson L, Håkansson J, Rohlin M. Comparison of panoramic and intraoral radiography and pocket probing for the measurement of the marginal bone level. J Clin Periodontol. 1992;19(5):326–32.

    Article  PubMed  Google Scholar 

  12. Schwendicke F, Samek W, Krois J. Artificial Intelligence in Dentistry: chances and challenges. J Dent Res. 2020;99(7):769–74.

    Article  PubMed  Google Scholar 

  13. Hosny A, Parmar C, Quackenbush J, Schwartz LH, Aerts H. Artificial intelligence in radiology. Nat Rev Cancer. 2018;18(8):500–10.

    Article  PubMed  PubMed Central  Google Scholar 

  14. Anwar SM, Majid M, Qayyum A, Awais M, Alnowami M, Khan MK. Medical Image Analysis using Convolutional neural networks: a review. J Med Syst. 2018;42(11):226.

    Article  PubMed  Google Scholar 

  15. Litjens G, Kooi T, Bejnordi BE, Setio AAA, Ciompi F, Ghafoorian M, van der Laak J, van Ginneken B, Sánchez CI. A survey on deep learning in medical image analysis. Med Image Anal. 2017;42:60–88.

    Article  PubMed  Google Scholar 

  16. Schwendicke F, Golla T, Dreher M, Krois J. Convolutional neural networks for dental image diagnostics: a scoping review. J Dent. 2019;91:103226.

    Article  PubMed  Google Scholar 

  17. Park EY, Cho H, Kang S, Jeong S, Kim EK. Caries detection with tooth surface segmentation on intraoral photographic images using deep learning. BMC Oral Health. 2022;22(1):573.

    Article  PubMed  PubMed Central  Google Scholar 

  18. Celik ME. Deep Learning Based Detection Tool for Impacted Mandibular Third Molar Teeth. Diagnostics (Basel Switzerland) 2022, 12(4).

  19. Jaiswal P, Bhirud DS. An intelligent deep network for dental medical image processing system. Biomed Signal Process Control. 2023;84:104708.

    Article  Google Scholar 

  20. Chen C-C, Wu Y-F, Aung LM, Lin JCY, Ngo ST, Su J-N, Lin Y-M, Chang W-J. Automatic recognition of teeth and periodontal bone loss measurement in digital radiographs using deep-learning artificial intelligence. J Dent Sci. 2023;18(3):1301–9.

    Article  PubMed  PubMed Central  Google Scholar 

  21. Moran M, Faria M, Giraldi G, Bastos L, Conci A. Do radiographic assessments of Periodontal Bone loss improve with deep learning methods for enhanced image resolution? Sensors 2021;21(6).

  22. Campbell JM, Klugar M, Ding S, Carmody DP, Hakonsen SJ, Jadotte YT, White S, Munn Z. Diagnostic test accuracy: methods for systematic review and meta-analysis. 2015, 13(3):154–62.

  23. Schünemann HJ, Oxman AD, Brozek J, Glasziou P, Jaeschke R, Vist GE, Williams JW Jr., Kunz R, Craig J, Montori VM, et al. Grading quality of evidence and strength of recommendations for diagnostic tests and strategies. BMJ (Clinical Research ed). 2008;336(7653):1106–10.

    Article  PubMed  Google Scholar 

  24. Deeks J, Bossuyt P, Gatsonis C. Cochrane Handbook for systematic reviews of Diagnostic Test Accuracy Version 1.0. The Cochrane Collaboration; 2010.

  25. Whiting PF, Rutjes AW, Westwood ME, Mallett S, Deeks JJ, Reitsma JB, Leeflang MM, Sterne JA, Bossuyt PM. QUADAS-2: a revised tool for the quality assessment of diagnostic accuracy studies. Ann Intern Med. 2011;155(8):529–36.

    Article  PubMed  Google Scholar 

  26. Leeflang MM. Systematic reviews and meta-analyses of diagnostic test accuracy. Clin Microbiol Infection: Official Publication Eur Soc Clin Microbiol Infect Dis. 2014;20(2):105–13.

    Article  Google Scholar 

  27. Leeflang M, Reitsma J, Scholten R, Rutjes A, Di Nisio M, Deeks J, Bossuyt P. Impact of adjustment for quality on results of metaanalyses of diagnostic accuracy. Clin Chem. 2007;53(2):164–72.

    Article  PubMed  Google Scholar 

  28. Ryu J, Lee DM, Jung YH, Kwon O, Park S, Hwang J, Lee JY. Automated Detection of Periodontal Bone Loss Using Deep Learning and Panoramic Radiographs: A Convolutional Neural Network Approach. Applied Sciences (Switzerland) 2023;13(9).

  29. Liu Q, Dai F, Zhu H, Yang H, Huang Y, Jiang L, Tang X, Deng L, Song L. Deep learning for the early identification of periodontitis: a retrospective, multicentre study. Clin Radiol 2023.

  30. Kong Z, Ouyang H, Cao Y, Huang T, Ahn E, Zhang M, Liu H. Automated periodontitis bone loss diagnosis in panoramic radiographs using a bespoke two-stage detector. Comput Biol Med 2023;152.

  31. Chen IH, Lin CH, Lee MK, Chen TE, Lan TH, Chang CM, Tseng TY, Wang T, Du JK. Convolutional-neural-network-based radiographs evaluation assisting in early diagnosis of the periodontal bone loss via periapical radiograph. J Dent Sci 2023.

  32. Amasya H, Jaju PP, Ezhov M, Gusarev M, Atakan C, Sanders A, Manulius D, Golitskya M, Shrivastava K, Singh A et al. Development and validation of an artificial intelligence software for periodontal bone loss in panoramic imaging. Int J Imaging Syst Technol 2023.

  33. Widyaningrum R, Candradewi I, Aji NRAS, Aulianisa R. Comparison of Multi-label U-Net and Mask R-CNN for panoramic radiograph segmentation to detect periodontitis. Imaging Sci Dentistry. 2022;52(4):383–91.

    Article  Google Scholar 

  34. Tsoromokos N, Parinussa S, Claessen F, Moin DA, Loos BG. Estimation of alveolar bone loss in Periodontitis using machine learning. Int Dent J. 2022;72(5):621–7.

    Article  PubMed  PubMed Central  Google Scholar 

  35. Sunnetci KM, Ulukaya S, Alkan A. Periodontal bone loss detection based on hybrid deep learning and machine learning models with a user-friendly application. Biomed Signal Process Control 2022;77.

  36. Shon HS, Kong V, Park JS, Jang W, Cha EJ, Kim SY, Lee EY, Kang TG, Kim KA. Deep learning model for classifying Periodontitis stages on Dental panoramic radiography. Appl Sci (Switzerland) 2022;12(17).

  37. Kabir T, Lee CT, Chen L, Jiang X, Shams S. A comprehensive artificial intelligence framework for dental diagnosis and charting. BMC Oral Health. 2022;22(1):480.

    Article  PubMed  PubMed Central  Google Scholar 

  38. Jiang L, Chen D, Cao Z, Wu F, Zhu H, Zhu F. A two-stage deep learning architecture for radiographic staging of periodontal bone loss. BMC Oral Health 2022;22(1).

  39. ErtaÅŸ K, Pence I, Cesmeli MS, Ay ZY. Determination of the stage and grade of periodontitis according to the current classification of periodontal and peri-implant Diseases and conditions (2018) using machine learning algorithms. J Periodontal Implant Sci 2022;52.

  40. Chang J, Chang MF, Angelov N, Hsu CY, Meng HW, Sheng S, Glick A, Chang K, He YR, Lin YB, et al. Application of deep machine learning for the radiographic diagnosis of periodontitis. Clin Oral Invest. 2022;26(11):6629–37.

    Article  Google Scholar 

  41. Alotaibi G, Awawdeh M, Farook FF, Aljohani M, Aldhafiri RM, Aldhoayan M. Artificial intelligence (AI) diagnostic tools: utilizing a convolutional neural network (CNN) to assess periodontal bone level radiographically-a retrospective study. BMC Oral Health. 2022;22(1):399.

    Article  PubMed  PubMed Central  Google Scholar 

  42. Li H, Zhou J, Zhou Y, Chen Q, She Y, Gao F, Xu Y, Chen J, Gao X. An interpretable computer-aided diagnosis method for Periodontitis from panoramic radiographs. Front Physiol 2021;12.

  43. Lee CT, Kabir T, Nelson J, Sheng S, Meng HW, Van Dyke TE, Walji MF, Jiang X, Shams S. Use of the deep learning approach to measure alveolar bone level. J Clin Periodontol. 2021;49(3):260–9.

    Article  PubMed  PubMed Central  Google Scholar 

  44. Ezhov M, Gusarev M, Golitsyna M, Yates JM, Kushnerev E, Tamimi D, Aksoy S, Shumilov E, Sanders A, Orhan K. Clinically applicable artificial intelligence system for dental diagnosis with CBCT. Sci Rep 2021;11(1).

  45. Danks RP, Bano S, Orishko A, Tan HJ, Sancho FM, D’Aiuto F, Stoyanov D. Automating Periodontal bone loss measurement via dental landmark localisation. Int J Comput Assist Radiol Surg. 2021;16(7):1189–99.

    Article  PubMed  PubMed Central  Google Scholar 

  46. Chen H, Li H, Zhao Y, Zhao J, Wang Y. Dental Disease detection on periapical radiographs based on deep convolutional neural networks. Int J Comput Assist Radiol Surg. 2021;16(4):649–61.

    Article  PubMed  Google Scholar 

  47. Thanathornwong B, Suebnukarn S. Automatic detection of periodontal compromised teeth in digital panoramic radiographs using faster regional convolutional neural networks. Imaging Sci Dentistry. 2020;50(2):169–74.

    Article  Google Scholar 

  48. Chang H-J, Lee S-J, Yong T-H, Shin N-Y, Jang B-G, Kim J-E, Huh K-H, Lee S-S, Heo M-S, Choi S-C et al. Deep learning hybrid method to automatically diagnose Periodontal Bone loss and stage periodontitis. Sci Rep 2020;10(1).

  49. Bayrakdar SK, Ҫelik Ö, Bayrakdar IS, Orhan K, Bilgir E, Odabaş A, Aslan AF. Success of Artificial Intelligence System in determining alveolar bone loss from Dental panoramic radiography images. Cumhuriyet Dent J. 2020;23(4):318–24.

    Article  Google Scholar 

  50. Krois J, Ekert T, Meinhold L, Golla T, Kharbot B, Wittemeier A, Doerfer C, Schwendicke F. Deep learning for the Radiographic detection of Periodontal Bone loss. Sci Rep 2019;9.

  51. Kim J, Lee H-S, Song I-S, Jung K-H. DeNTNet: deep neural transfer network for the detection of periodontal bone loss using panoramic dental radiographs. Sci Rep 2019;9.

  52. Lee J-H, Kim DOh, Jeong S-N, Choi S-H. Diagnosis and prediction of periodontally compromised teeth using a deep learning-based convolutional neural network algorithm. J Periodontal Implant Sci. 2018;48(2):114–23.

    Article  PubMed  PubMed Central  Google Scholar 

  53. Buda M, Maki A, Mazurowski MA. A systematic study of the class imbalance problem in convolutional neural networks. Neural Netw. 2018;106:249–59.

    Article  PubMed  Google Scholar 

  54. Kumar A, Bhadauria HS, Singh A. Descriptive analysis of dental X-ray images using various practical methods: a review. PeerJ Comput Sci. 2021;7:e620.

    Article  PubMed  PubMed Central  Google Scholar 

  55. Patel S, Dawood A, Whaites E, Pitt Ford T. New dimensions in endodontic imaging: part 1. Conventional and alternative radiographic systems. Int Endod J. 2009;42(6):447–62.

    Article  PubMed  Google Scholar 

  56. Venkatesh E, Elluru SV. Cone beam computed tomography: basics and applications in dentistry. J Istanbul Univ Fac Dentistry 2017;51(3 Suppl 1):S102–s121.

  57. Joseph Redmon SD, Ross Girshick. Ali Farhadi: You Only Look Once: Unified, Real-Time Object Detection. In: arxiv:150602640[csCV] 2015.

  58. Olaf Ronneberger PF, Thomas Brox. U-Net: Convolutional Networks for Biomedical Image Segmentation. arXiv: 150504597 [csCV] 2015.

  59. Yin XX, Sun L, Fu Y, Lu R, Zhang Y. U-Net-Based Medical Image Segmentation. Journal of healthcare engineering 2022;2022:4189781.

  60. Tsung-Yi Lin PD, Girshick R, He K. Bharath Hariharan, Serge Belongie: Feature Pyramid Networks for Object Detection. arxiv:161203144[csCV] 2016.

  61. Ren S, He K, Girshick R, Sun J. Faster R-CNN: towards real-time object detection with region proposal networks. IEEE Trans Pattern Anal Mach Intell. 2017;39(6):1137–49.

    Article  PubMed  Google Scholar 

  62. Kaiming He GG. Piotr Dollár, Ross Girshick: Mask R-CNN. arxiv:170306870[csCV] 2017.

Download references

Acknowledgements

The authors are very grateful to all the participants in this study.

Funding

This work was supported by the Beijing Stomatological Hospital of Capital Medical University Young Scientist Program (No. YSP202011), the Beijing Municipal Government grant (Beijing Laboratory of Oral Health, PXM2021-014226-000041), the Beijing Municipal Science and Technology Commission (Z181100001718208), the Beijing Municipal Education Commission (119207020201), the Innovation Research Team Project of Beijing Stomatological Hospital, Capital Medical University (CXTD202201), the Chinese Research Unit of Tooth Development and Regeneration, Academy of Medical Sciences (2019-12 M-5-031), the National Natural Science Foundation of China (92049201, 82030031, 81991504, and 92149301), the Beijing Advanced Innovation Center for Big Data-based Precision Medicine (PXM2021_014226_000026), the Beijing Municipal Government (Beijing Scholar program- PXM2020_014226_000005 and PXM2021_014226_000020), the Beijing Municipal Colleges and Universities High Level Talents Introduction and Cultivate Project-Beijing Great Wall Scholar Program (CIT&TCD 20180332), and the National Key Research and development Program (2022YFA1104401).

Author information

Authors and Affiliations

Authors

Contributions

Conceptualisation, Wenbin Li, Songlin Wang; methodology, Xin Li, Dan Zhao; protocol, Xin Li, Dan Zhao; validation Xin Li; resources, Xin Li, Dan Zhao; data acquisition, Xin Li, Jinxuan Xie; software, Xin Li; data analysis, Xin Li; quality assessment, Xin Li, Jinxuan Xie, Yajie Li; writing—original draft preparation, Xin Li, Dan Zhao; writing—review and editing, Hao Wen, Chunhua Liu, Wenbin Li, Songlin Wang; visualisation, Xin Li; supervision, Wenbin Li, Sonlin Wang; funding acquisition, Dan Zhao, Songlin Wang. All authors have read and agreed to the published version of the manuscript.

Corresponding author

Correspondence to Songlin Wang.

Ethics declarations

Ethics approval and consent to participate

Not applicable.

Consent for publication

Not Applicable.

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Electronic supplementary material

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Li, X., Zhao, D., Xie, J. et al. Deep learning for classifying the stages of periodontitis on dental images: a systematic review and meta-analysis. BMC Oral Health 23, 1017 (2023). https://doi.org/10.1186/s12903-023-03751-z

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1186/s12903-023-03751-z

Keywords