Bayesian model with application to a study of dental caries

Background Dental caries are a significant public health problem. It is a disease with multifactorial causes. In Sub-Sahara Africa, Ethiopia is one of the countries with a high record of dental caries. This study was to determine the risk factors affecting dental caries using both Bayesian and classical approaches. Methods The study design was a retrospective cohort study in the period of March 2009 to March 2013 dental caries patients Hawassa Haik Poly Higher Clinic. The Bayesian logistic regression procedure was adapted to make inference about the parameters of a logistic regression model. The purpose of this method was generating the posterior distribution of the unknown parameters given both the data and some prior density for the unknown parameters. Results From this study the prevalence of natural dental caries was 87% and non-natural dental caries were 13%. The age group of 18–25 was higher prevalence of dental caries than the other age groups. From Bayesian logistic regression, we found out that rural patients, do not clean their teeth, patients from SNNPR and age group 18–25 are statistically significant. The finding from the Bayesian statistics approach is getting popular in data analysis than classical statistics because the technique is more robust and precise. Conclusions Bayesian approach was found to be better than classical method as the value of the standard errors in Bayesian approaches is smaller than that of classical logistic regression. The Bayesian credible interval is smaller than the length of the confidence interval for all significant risk factors. Age, sex, place of residence, region and habit of cleaning teeth was found to have a significant effect on dental caries patients.


Background
Dental caries is a microbial, multifactorial disease that succeeds in destroying the hardest substance of the human body, the enamel [1]. This disease is identified by the World Health Organization (WHO) as one of the most important public health issues [2]. Now a day dental caries on the rise to become major public health problems worldwide, nearly 60-90% of children and about 100% of adults have dental cavities, often leading to pain and discomfort [3].
The problem related with dental caries leads to a decrease in the quality of life of the affected individuals and society, with disparities related to well-known issues of socioeconomic, lack of preventive efforts, and dietary changes [4]. The burden of dental caries can affect school attendance, eating and speaking which leads to impair growth and development [5,6].
Dental caries is one of the public health problems in both developed and developing countries [7]. Deteriorating oral health is an emerging public health concern in developing countries, yet little attention has been given to oral health in most sub-Saharan countries. The extents of caries, periodontal diseases and the associated risk factors have not been widely studied at the community level [8]. It is increasing gradually due to the growing consumption of sugary substances and poor oral care practices and inadequate health service utilization [9]. Ethiopia previous studies showed that, there were differences in different localities with regard to the prevalence of dental caries; 48.5% in Finote Selam, Ethiopia [10], 21.8% in Bahir dar city Ethiopia [11] and 78.2% in Debre Tabor General Hospital dental clinic [11,12].
Dental caries causes tooth pain, discomfort, eating impairment, loss of tooth and delay language development. Furthermore, dental caries has effects on children's concentration in school and a financial burden on the families [13,14]. Risk factors such as sex, age, dietary habits, socioeconomic and oral hygiene status are associated with an increased prevalence and incidence of dental caries in a population [15]. The Person suffers from dental caries were examined for the type of dental caries in relation to different factors. The occurrence of dental caries was found to be slightly higher in females 51.45% [16].
Age is directly and strongly associated with prevalence of dental caries with increasing age the number of surfaces affected by caries increases, plateauing at around 50 years of age [17]. Teeth should be cleaned thoroughly at least twice a day using a fluoride toothpaste. Brushing helps remove the plaque and food particles from the tooth surface and flossing helps remove the plaque and food particles from the areas between the teeth. In Ethiopia, existing dental health services are limited. Even though, dental caries are highest in the country, much is not known about the factors affecting it in the study area. Therefore, this study was to determine statistical association between dental caries and some risk factors among patients attending Dental Clinic in Hawssa Haik poly Higher Clinic.

Methodology
The study design was a retrospective cohort study in the period of March 2009 to March 2013. Data were collected by reviewing the Dental caries patient cards and information sheets in the Hawssa Haik poly Higher Clinic. The study is only Dental caries patients who had under the treatment been followed up in the clinic. A total numbers of 6007 dental caries patients in the clinic were considered for this study. The dependent variable used in this study was dental care that is dichotomous as natural dental caries (y i = 1) and non-natural dental caries (y i = 0). The independent variables in this study are sex, age, region, place of residence and habit of cleaning of teeth. The statistical method used in this study is known as classical approach and Bayesian approach. Classical approach, logistic regression analysis is to find the best fitting model to describe the relationship between an outcome and risk factors where the outcome is dichotomous. It is used to investigate the effect of risk factors on the probability of having natural dental caries [18]. Logistic regression models use a logit link function and it is expressed as: Where P i is the probability of experiencing the outcome of interest for subject i, and X 1i ,..., X ki are risk factors and β i denotes the i th regression coefficient [19]. Based on this model, the effect of each risk factor on the outcome can be expressed as an odds ratio. Binary outcomes are common in retrospective studies such as cohort studies. Logistic regression yields an odds ratio that approximates the risk ratio when the risk outcomes is low (< 10%). A consensus has been reached in an extensive argument in much of the literature that the risk ratio is preferred over the odds ratio for retrospective studies in case of the risk outcome less than 10%.. To obtain a modelbased estimate of risk ratios, log-binomial regression has been recommended. However, this model may fail to converge and many methods have been provided as an alternative in these situations as Robust Poisson [20]. Log-binomial regression model is similar to the logistic regression model, except that it assumes a log link instead of a logit link, hence providing risk ratios instead of odds ratios. It can be presented as, Based on this model, the effect of each risk factor on the outcome can be expressed as a risk ratio. There may be challenges when using the log-binomial model to estimate the RR because when fitting the log-binomial model, especially given continuous variables, non-convergence may be an issue when the MLE is close to or on the boundary of the parameter space [21]. The log-binomial is commonly used to estimate the RR; the OR estimated using logistic regression is often used to approximate the RR when the outcome is rare. However, regardless of the prevalence of the outcome, logistic regression predicted exposed and unexposed risks may be used to estimate the RR. When maximum likelihood estimation is used to fit the logistic model, estimation of the standard error of the RR is difficult. To overcome such difficulty in the estimation of the SE of the RR and provide a flexible framework for modeling, we developed a Bayesian logistic regression (BLR) model to estimate the OR, with an associated credible interval.
The Bayesian modeling framework and current software for Bayesian analysis can meet these complex challenges in a straightforward manner. Thus, we extended the logistic regression model for estimating the parameters to the Bayesian frame work. In the Bayesian framework, there are three key components associated with parameter estimation: the prior distribution, the likelihood function, and the posterior distribution. Bayesian Inference starts with formulating a prior probability distribution over the unknown parameters β, which summarizes a set of beliefs of knowledge before we observations the data [22]. The likelihood function is expressed as: Where the dental caries for the subject i who has covariate vector x i , y i indicates the natural dental caries (y i = 1), or non -natural dental caries (y i = 0) of the i th subject. Prior distributions play a very important role in Bayesian statistics. We have no prior knowledge available for the parameters of the score vectors. As a result the choice of the prior distribution becomes a challenge. In this case we can use a non-informative prior on the parameters of the score-vectors. Results of the Bayesian noninformative logistic regression approach tend to mimic a Maximum Likelihood approach, but we must observe that this non-informative approach on parameters of the scores is not non-informative on the parameters of the original variables. For this study, the most common priors for logistic regression parameters, which has the form: β j ∼ N(μ j , σ 2 j ) was used. This implies the normal distribution with mean μ j and with variance σ 2 j . It can be expressed as [4]: In the case of no available prior knowledge, we consider a normal distribution with mean μ j = 0 and large variance. In this essay, we choose σ 2 j = 1000. The posterior distribution is derived by multiplying the prior distributions of the parameters of the likelihood function given as follows: This gives a complex posterior distribution that is complicated to converge to a known distribution. In order to determine the posterior distribution, we will use the MCMC in the simulation of the random numbers following the posterior distribution. The Markov chain Monte Carlo method is a general method that generates the estimates of β (unknown parameters) from appropriate distribution and then corrects the values generated to have a better estimate of the desired posterior distribution [23]. The Gibbs sampling algorithm is a method to generate an instance from the distribution of each variable in turn, conditional on the current values of the other variables. It is a special case of Metropolis-Hasting algorithm where the random value is always accepted. Suppose that we partition the parameter vectors of the interest into the components. The term convergence of an MCMC algorithm refers to whether the algorithm has reached its equilibrium (target) distribution [24]. Several diagnostic tests have been developed to monitor the convergence of the algorithm such as time series, Density, autocorrelation, Gelman Rubin [25].

Results of analysis
In Table 1, the result shows that females have more natural dental caries than males. The age group of 18-25 was a higher prevalence of dental caries than the other age groups. The patients living in urban had natural dental caries higher risk than those who live in rural parts. For patients coming from the South Nation Nationality People Representative (SNNPR), the proposition of the natural dental caries was 80.4% and for those coming from other regions was 88.3%. About 87.4% of the patients who did not clean their teeth had natural dental caries and the remaining 12.6% had non-natural dental caries. The prevalence of the outcome of interest which is natural dental caries was 87% and remaining 13% of the patients are non-natural dental caries.

Time series plot
It is one of the tests used to diagnosis the convergence of Bayesian analysis. Time series plot indicates a good convergence; three independent generated chains mixed together or overlapped ( Fig. 1 and Appendix: Fig. 2).

Density plot
The plots for all risk factors indicate that the coefficient has bimodal density and hence the simulated parameter values were converged ( Fig.1 and Appendix: Fig. 2).

Autocorrelation plot
From Fig. 1 and Appendix: Fig. 2, we observed that the autocorrelation for all parameters become low when we consider a lag equal to 50. Thus, an independent sample can be obtained by rerunning the algorithm with thin set equal to lag 50. The plots show that independent chains were mixed or overlapped to each other which confirm its convergences.

Gelman-Rubin statistics
It is one way of checking convergence in Bayesian analysis. It can be applied only when multiple chains are used. Gelman-Rubin convergence Statistics with the width of the pooled green, the average width of within the individual runs blue and their ratio for plotting purposes the pooled within the interval width are normalized to have an overall maximum of one (Appendix: Fig. 2).

Results of classical approach
While the odds ratio (OR) is one of the most frequently used measures of association between a risk factor and an outcome in epidemiology, the risk ratio is important indices to quantify the strength of association between a given natural dental caries and a suspected risk factor. The main reason for the popularity of the OR is because the OR is the measure of association usually provided by logistic regression models. There is a large body of literature discussing the relationship between OR and RR. There is still an ongoing debate on the appropriateness of odds ratios versus prevalence ratios as measures of effect in retrospective cohort studies. It is known that the OR overestimates the RR when the outcome of interest is larger than 10%. The logistic model provided a better fit to the data relative to the log binomial and Poisson models, each of which can be problematic. Using a Poisson model with a robust standard error generally makes an adequate correction for the standard error. The log binomial model may fail to converge, which is not uncommon. Table 2 show that, the odds ratio 0.617 which shows that the odds of natural dental caries are decreased by 38.3% for patients in the age group 18-25 compared to the patients in the age group Fig. 1 Convergence of Time Series, density and autocorrelation plots for the coefficients ≤18 controlling for the other variables in the model. The odds of natural dental caries have decreased by 27.4% for patients with age group 25-35 compared to the reference group. The odds ratio 0.749 shows that the odds of natural dental caries have decreased by 25.1% for patients with age group > 35 compared to the reference category. The odds ratio 0.691 indicates that the odds of natural dental caries have decreased by 30.9% for patients in rural compared to those with urban controlling for the other variables in the model. The odds ratio 1.639 means that patients from SNNPR are 63.9% more likely to have natural dental caries than the patients from other regions controlling for other variables in the model.
The OR = 1.475 indicates that patients clean their teeth were 47.5% more likely to have natural dental caries compared to patients did not clean their teeth controlling for the other variables in the model. The result gives an OR = 0.8157, this indicates that, male are 0.8157 less likely to have natural dental caries than female.

Results of Bayesian approach
The finding in Table 3 show that, regarding the effects of gender on the dental caries, we found out

Model comparison
From Table 4, we made comparison of Bayesian and classical approaches and identified that more significant risk factors, numerical value differences in standard error. The important comparison method used is the standard error of the estimated parameters for both approaches. In Bayesian logistic regression approach all significant factors have smaller standard error than the classical logistic regression approach. From Table 4 results, we have found that the length of Bayesian credible is lower than the length of the confidence interval for all covariates in classical logistic regression. Therefore, we can say that the Bayesian approach provides better results using the confidence interval/credible interval and standard errors of the estimated parameters.

Discussion
The prevalence of non-natural dental caries found in the present study was 13%. From this study we found that the odds ratio of being non-natural dental caries for males were higher than females. Similarly study done in Ethiopia about the prevalence of dental caries in North west Ethiopia showed the prevalence of dental caries was found to be different between male and female [26]. The highest proportion of dental caries is observed in the age group 18-25 on the other hand, the lowest proportion of dental caries in the age group < 18 which is supported by the study [9]. The urban patient is more likely to dental caries than rural patient. The reason could be patient who lives in urban areas tend to use more sweet consumption than rural patient. The paper [27] which shows that there are differences in oral health related behavior between urban and rural residences confirms our study. The prevalence of daily use of tooth picks was consistently and significantly higher among more urban than rural residence.

Conclusions
In this study we tried to show the performance of Bayesian logistic regression over the classical logistic regression. The factors Age, gender, region, place of residence and habit of cleaning teeth were associated risk factors for dental caries. A comparison of the classical and Bayesian approach logistic regression reveals lower standard errors of the estimated coefficients in the Bayesian logistic regression approach. At the same time in Bayesian approach were used and compare with method of maximum likelihood and found that the length of the Bayesian credible interval is smaller than the length of the confidence interval for all factors.