Novel caries loci in children and adults implicated by genome-wide analysis of families

Background Dental caries is a common chronic disease among children and adults alike, posing a substantial health burden. Caries is affected by multiple genetic and environmental factors, and prior studies have found that a substantial proportion of caries susceptibility is genetically inherited. Methods To identify such genetic factors, we conducted a genome-wide linkage scan in 464 extended families with 2616 individuals from Iowa, Pennsylvania and West Virginia for three dental caries phenotypes: (1) PRIM: dichotomized as zero versus one or more affected primary teeth, (2) QTOT1: age-adjusted quantitative caries measure for both primary and permanent dentitions including pre-cavitated lesions, and (3) QTOT2: age-adjusted quantitative caries excluding pre-cavitated lesions. Genotyping was conducted for approximately 600,000 SNPs on an Illumina platform, pruned to 127,511 uncorrelated SNPs for the analyses reported here. Results Multipoint non-parametric linkage analyses generated peak LOD scores exceeding 2.0 for eight genomic regions, but no LOD scores above 3.0 were observed. The maximum LOD score for each of the three traits was 2.90 at 1q25.3 for PRIM, 2.38 at 6q25.3 for QTOT1, and 2.76 at 5q23.3 for QTOT2. Some overlap in linkage regions was observed among the phenotypes. Genes with a potential role in dental caries in the eight chromosomal regions include CACNA1E, LAMC2, ALMS1, STAMBP, GXYLT2, SLC12A2, MEGF10, TMEM181, ARID1B, and, as well as genes in several immune gene families. Our results are also concordant with previous findings from association analyses on chromosomes 11 and 19. Conclusions These multipoint linkage results provide evidence in favor of novel chromosomal regions, while also supporting earlier association findings for these data. Understanding the genetic etiology of dental caries will allow designing personalized treatment plans based on an individual’s genetic risk of disease. Electronic supplementary material The online version of this article (10.1186/s12903-018-0559-6) contains supplementary material, which is available to authorized users.

values between 20 and 60% [2][3][4][5][6][7]. To date, there have also been numerous studies investigating association of dental caries with candidate genes or with whole-genome Single Nucleotide Polymorphism (SNP) panels [8,9]. The only previous genome-wide linkage study of caries was conducted using a panel of 392 microsatellite markers, on 46 extended Filipino families with 642 total individuals [10]. This study found suggestive linkage of low caries experience to chromosome regions 5q13.3, 14q11.2, and 13q27.1, and high caries experience to 13q31.1 and 14q24.3 However, results of previous studies have, in general, not been extensively replicated, possibly due to relatively small sample sizes [8,9] and the enumeration of genetic factors is far from complete.
Our present study is the first to apply genome-wide multipoint linkage analysis to explore the genetic etiology of caries (whether in childhood or adulthood) using densely spaced SNPs on a population previously analyzed by genome-wide association. Genome-wide linkage analysis is a complementary strategy to genome-wide association analysis for gene-discovery. Whereas association identifies specific marker alleles correlated with the caries phenotype, linkage analysis strategies identify genomic regions shared between related individuals who show similar disease characteristics. The advantage of linkage analysis is that it makes full use of familial inheritance, is less sensitive to allelic heterogeneity, and, unlike association, can be used to detect rare disease-causing mutations. Furthermore, multipoint linkage utilizes genotypes from SNPs neighboring the test location, while association conducts tests at each location independently.
In this study, multipoint non-parametric linkage analysis was conducted, i.e., no assumptions were made with respect to the mode of inheritance of dental caries [11], and the analysis was, therefore, robust to uncertainty about the underlying genetic model. Empirical significance of the linkage signals was assessed across the genome by simulating multiple sets of genome-wide data such that the SNP genotypes were unlinked to caries status.

Study subjects and genotype data
The families and individuals included in this study are from western Pennsylvania, West Virginia, and Iowa. Subjects from Pennsylvania and West Virginia were ascertained through the Center for Oral Health Research in Appalachia (COHRA; [12]). Additional subjects from Pennsylvania were recruited under the University of Pittsburgh Dental Registry and DNA Repository (DRDR; [13]). Subjects from Iowa were recruited under two University of Iowa projects, the Iowa Fluoride Study (IFS; [14][15][16][17]) and Iowa Head Start (IHS; [18]). All subject recruitment and data collection was approved by site-specific Institutional Review Boards. Genotyping was conducted under the Gene Environment Association Studies Initiative (GENEVA) for approximately 600,000 SNPs on an Illumina platform (Human 610_Quadv1_B; Illumina, Inc., San Diego, CA, USA). All genotype and phenotype data is available on dbGaP (The database of Genotypes and Phenotypes; https://www.ncbi.nlm.nih.gov/gap; accession number phs000095.v3.p1). Details on genotyping and quality control protocols are also presented on dbGaP, or can be found in earlier studies [19,20]. Table 1 summarizes the different subsets of data in terms of the sample available for this study. This study utilizes complete families, in other words, non-genotyped and non-phenotyped individuals also contribute to various aspects of the analysis. Prior studies primarily utilized unrelated individuals for conducting association analysis. The starting study sample comprised a total of 4727 self-reported non-Hispanic white individuals, of which 437 were unrelated, 1674 were in 558 two-parent and single offspring (trio) families, and 2616 were in 464 non-trio families. Approximately 76% of individuals were genotyped ( Table 1).

Definition of dental caries phenotypes
Caries scores were assessed on the COHRA, IFC, and IHS subjects in accordance with the COHRA study protocol [12]. For subjects in the DRDR study, we used caries scores abstracted from clinical records by dental students trained by Dr. Alexandre R. Vieira, who is a co-author on this manuscript. We defined three dental caries phenotypes, one based on primary dentition (PRIM), and two that combine primary and permanent dentitions (QTOT1, QTOT2). PRIM was coded as a binary primary dentition caries phenotype based on the count of decayed and/or filled primary teeth (dft) score. An individual with a dft score of 1 or more was designated as being affected. The primary teeth from all subjects with primary or mixed dentition were assessed for PRIM. These individuals included adults with over retained primary teeth. QTOT1, an age-adjusted quantitative caries phenotype, is based on the sum of the dft score (primary teeth) and D 1 MFT score (count of decayed, missing, and filled permanent teeth including white spot lesions). QTOT1 scores were generated by adjusting this raw sum for age and age-squared effects using locally fitted splines. Scores for 113 individuals below 2 years of age and 5 individuals above 60 years, were excluded from linkage analysis due to a very low caries experience in the 0-2 years age group, and the presence of very few subjects above 60 years of age. Age-adjusted QTOT2, the second quantitative caries phenotype, is based on the sum of the dft score and the D 2 MFT score (count of decayed, missing, and filled permanent teeth excluding white spot lesions). Age-adjustment was performed as for the QTOT1 phenotype; and QTOT2 scores for 115 individuals between 0 and 2 years of age and 44 individuals above 60 years were set to missing.

Data cleaning and preparation
Genetic map positions were generated for all SNPs. These genetic markers were filtered based on genotyping rates and Hardy-Weinberg proportions. The SNPs were then pruned for linkage disequilibrium (LD). SNPs with residual LD were clustered into super-markers. The procedures used for filtering SNPs, map creation, and LD-based SNP pruning and clustering are described below.

Genetic map creation
The Genetic Map Interpolator (GMI) program, [21] was used to calculate genetic map positions for all SNPs. Sex-averaged map positions were created for SNPs on chromosomes 1-22, and female map positions were created for SNPs on the X chromosome. In the GMI program, the physical basepair (bp) position of each SNP per March 2006 Build NCBI36/hg18 was transformed to the corresponding centiMorgan (cM) scale genetic map distance based on interpolation into the Rutgers Combined Linkage-Physical Map [22].

SNP filtering
In addition to the quality control and cleaning steps detailed on dbGap, we filtered SNPs on the basis of low genotyping success rate and deviation from Hardy-Weinberg equilibrium (HWE) proportions using the software PLINK [23]. SNPs with genotyping success rates below 95%, calculated using genotype data for all individuals, were excluded from analysis. Known genotypes of founders (i.e. those individuals in a family whose parents are not included in the study) and unrelated individuals were used to test SNPs for HWE proportions. The HWE proportions significance threshold was set at 10 − 5 for rejecting the null hypothesis of no deviation from HWE proportions.

Linkage disequilibrium-based SNP pruning and clustering
The genotyping panel available to this study was designed for genome-wide association analysis. When conducting linkage analysis on densely spaced SNP marker loci, the presence of substantial marker-to-marker LD is known to inflate linkage signals, especially if parental genotypes are missing [24]. In this study, LD was removed in two stages. First, the set of quality-filtered SNPs were pruned using PLINK such that the LD r 2 (a measure of LD based on the square of the correlation coefficient between loci) value among remaining SNPs fell below 20%. In PLINK, LD pruning consists of creating blocks of 50 consecutive SNPs followed by recursive removal of SNPs within blocks, until the LD r 2 value among the remaining SNPs is below the desired threshold (20% in our case). Only the unrelated genotyped individuals in our datapedigree founders and unrelated cases/controlswere used to calculate LD in this step. Any remaining LD was then accounted for using LD-based clustering in Merlin [25]. In clustering, each block of consecutive SNPs that shows an r 2 value greater than a specified threshold (in our case 10%), is analyzed collectively as a super-marker. Table 2 summarizes the data processing steps undertaken to select SNPs for linkage analysis, and the samples that contributed information to specific parts of this data cleaning. After HWE filtering and LD-based pruning, 127,511 SNPs in low LD (pairwise r 2 ≤ 20%) were retained. LD-based SNP clustering combined 92,495 SNPs into 20,634 Note: a Founders: Individuals in a pedigree or trio whose parents are not included in the study. For example, both parents in a trio are founders. Also note that some of the larger multigenerational pedigrees may have more than two founders b The counts of individuals differs from totals provided in Table 1 since not all founders and unrelated individuals were genotyped for this study c Trios: Family structure of two parents and one child; non-trio pedigrees: families with four or more members super-markers, leaving 35,016 SNPs to be analyzed individually. The average genetic map distance between the final set of markers (super-marker index and singleton SNPs) is approximately 0.07 cM on the autosomes and 0.13 cM on the X-chromosome. Super-marker and singleton SNP allele frequencies were generated as maximum likelihood estimates using Merlin. The SNP clustering and allele frequency estimation steps utilized 1022 informative families. Table 3 summarizes the sample of individuals used within the linkage analysis for the three traits. In the table are presented the number of informative pedigrees, individuals, and phenotyped relative pairs by relationship type that were included in NPL and QT linkage. A total of 160 relative pairs were informative for PRIM NPL. For QT linkage, the corresponding informative relative pair counts were 1026 and 1038 for QTOT1 and QTOT2.

Genome-wide linkage of PRIM
In non-parametric linkage (NPL) analysis, affected individuals within each pedigree are examined to detect whether affected relatives share genomic regions identical-by-descent (IBD) more often than expected due to their relatedness alone. This IBD sharing is tested at locations along each chromosome. Genome-wide NPL analysis was carried out for the PRIM phenotype using the S All statistic [26] as implemented in Merlin [25].

Genome-wide linkage of QTOT1 and QTOT2
The quantitative trait (QT) regression-based linkage method, Merlin-regress, [27] was utilized to carry out analyses of the two quantitative phenotypes across autosomes. The QT linkage method is based on regressing estimated IBD sharing between relative pairs on the squared sums and differences of their phenotypes. It does not handle X-linked SNPs, hence the X chromosome was not analyzed for the two quantitative traits. Merlin-regress analyses required specification of a heritability parameter (set at 50% based on published estimates for DMFT) and sample-based means and variances for QTOT1 and QTOT2. All results, NPL and QT linkage, are reported as LOD (logarithm of the odds of linkage) scores.

Empirical significance of observed linkage signals
The most commonly used LOD score threshold of 3.0 used to test for significant linkage (Morton) was derived for parametric linkage analysis of a single locus on a binary trait phenotype. Subsequent research (e.g. those reviewed in [28]) that address newer linkage methods such as whole-genome analysis, affected-relative pairs and multipoint calculations are also based on assumptions on the study data, that are rarely true in real-life. Therefore, to correct for multiple testing, we carried out a simulation study to assess the genome-wide significance thresholds for the NPL and QT regression LOD scores. In general, for a null simulation, hundreds of simulated genetic data sets are generated and analyzed to produce an empirical distribution of LOD scores. Since this process would be prohibitively time consuming given the study data, we used an adaptive approach to generate null distributions. The replicate pool method, Pseudo [29] was used to derive the empirical null distribution of NPL scores for PRIM. An initial pool of 100 simulated genotype data sets was generated for this study using Merlin (simulate option) followed by the pseudo-simulation of 100,000 NPL genome-scans to create the empirical distribution of unlinked NPL LOD scores. Pseudo was not utilized for the quantitative data simulations since QT-regression does not produce pedigree-specific LOD scores. For QTOT1 and QTOT2, 5000 data sets each were simulated and analyzed using Merlin.

Selection of linkage peak regions and etiologic genes
For super-markers, the NPL and QT LOD scores correspond to the index (first) SNP of each cluster. In the linkage scan for each phenotype, maximizing markers in regions with LOD score ≥ 2.0 were identified as linkage peaks. A support interval of one LOD drop was used for exploring genes under selected linkage peaks. The one LOD drop support interval is the interval where the LOD score is within one unit of its maximum. Regions with LOD scores ≥1.0 were identified for trait. Overlap of linkage signal among the three traits was determined based on overlap of peak support intervals or secondary peaks(s) of at least 1.0 LODs, lying within the primary peak support interval. In the event peaks for multiple phenotypes overlapped, the resulting support intervals were reduced to the region of overlap.
Genes within these support intervals were examined for a potential etiologic role in dental caries incidence. Genes identified as causal would include, for example, genes related to blood glucose levels, secretory function of the salivary glands, and host immune response. Proximity of genes to SNPs corresponding to linkage peaks was determined by physical map positions obtained from UCSC Genome Browser corresponding to the March 2006 (NCBI36/hg18) Assembly [30]. When no genes were identified as potentially contributing to caries risk, we instead listed the gene closest to the SNP with the maximum observed LOD.

Comparison with prior published findings
A systematic search of literature was conducted to compile caries risk-conferring genes and genomic regions from previous studies utilizing some portion of our data, as well as from studies of other populations. Physical positions for these genes and genomic regions were then mapped to our linkage scans. Linkage regions with a LOD score of 1.0 or greater have been reported as indicative of concordance or replication, as appropriate.

Sensitivity analysis Effect of variation in parameter values on NPL statistics
For multifactorial diseases such as caries, the true underlying genetic model for disease is difficult to ascertain. In this study, model-free linkage methods were used to detect linkage. The QT methods are sensitive to the misspecification of the required programmatic input values. We conducted a sensitivity analysis for the heritability parameter (HP), since published literature provides a wide range of heritability values (40-60%), and our work utilized HP = 50%. In the sensitivity analysis, HP values were set at 40, and 60% for QTOT1 and QTOT2.
Mega2 [31] was used to re-format and create input files for all the software used in data cleaning, LD-based pruning and clustering, genetic map creation, linkage analysis and data simulation. There were 287 individuals with known PRIM phenotypes (panel A), of which 243 individuals were affected for PRIM. Of these 243 subjects, 242 were 18 years or younger in age (panel B). Subjects with primary or mixed dentition included in the PRIM NPL analysis ranged from 15 months to 22.5 years of age, with a mean of 7.4 years. These subjects with primary dentition caries constitute mainly the youngest generation. The distribution of the raw caries index by decade, age-adjusted index by decade, and age-adjusted caries index within all phenotyped individuals compared to those between 2 and 60 years of age are shown for QTOT1 (panels C, D, and E) and QTOT2 (panels F, G, and H). The number of phenotyped individuals, range, mean and standard deviation are presented in panel I for both quantitative traits. A larger number of individuals were phenotyped for D 2 MFT as compared to D 1 MFT in this study. For QTOT1 and QTOT2, there were 2484, and 2868 phenotyped individuals in the 2-60 age range. Both of the age-adjusted phenotypes follow an approximately normal distribution, with a mean of zero. The QTOT1 and QTOT2 mean and standard deviations for the 2-60 age group were included as distribution parameters within quantitative trait linkage. Figure 2 shows genome-wide LOD scores by SNP (or super-marker index SNP) for PRIM, QTOT1, and QTOT2. The empirical 5% genome-wide significance level, indicated as a solid horizontal line, was 3.48, 3.61, and 3.76 for PRIM, QTOT1, and QTOT2, respectively. Overlapping LOD score peaks for multiple phenotypes were observed in a few regions.

Linkage analysis
Highest LOD Score Regions Table 4 presents peak LOD scores and 1-LOD support intervals. The SNP (or index SNP) with the maximum LOD value in each peak is identified along with its genomic location. Regions with maximum LOD ≥ 2.0 are shown ordered by chromosome, along with secondary peak(s) of at least 1.0 LODs, if observed for other traits. The highest LOD scores by trait were 2.90 for PRIM, 2.38 for QTOT1, and 2.76 for QTOT2. Detailed results for all SNPs that lie within the support region for peaks reported in Table 4 with a LOD score of 2.0 or more are provided in supplementary material [see Additional File 1].
For each linkage peak, the table also reports the closest gene, if found, with a potential role in caries incidence. For two of these peaks, one on chromosome 2 (QTOT2; LOD 2.30) and the other on chromosome 3 (PRIM; LOD 2.50), no such etiologic genes were identified within the support intervals. In these intervals, genes BCL11A (60.538-60.634 Mb) and KAT2B (20.056-20.171 Mb) were found to be closest to the respective LOD score peak SNPs. The genes within linkage peak regions that may play an etiologic role in dental caries are described in the sections below.  (Table 4). Under this peak, the CACNA1E (179.719-180.037 Mb) gene has been shown to be involved in glucose-evoked insulin secretion in mice [32]. Poor glycemic control has potential implications for increased caries risk in humans. Mutations in the LAMC2 (181.422-181.481 Mb) laminin gene are known to cause non-Herlitz form of junctional epidermolysis bullosa, which includes hypodontia and dental caries among its phenotypes [33].  Fig. 1 Distribution of (a) PRIM by binary affection status, (b) age at exam of individuals categorized as PRIM affected, (c) raw dft + D 1 MFT (d) age-adjusted QTOT1, (e) age-adjusted QTOT1 for the full sample compared to the distribution for the 2-60 age group, (f) raw dft + D 2 MFT, (g) age-adjusted QTOT2, (h) age-adjusted QTOT2 for the full sample compared to the distribution for the 2-60 age group, and (i) mean, standard deviation, range and sample size for QTOT1 and QTOT2; shaded areas in panels D and G indicate individuals below the age of 2 and above 60 years with phenotypes excluded from quantitative trait linkage analysis Chromosome 3 The GXYLT2 (73.020-73.107 Mb) gene is located within the chromosome 3 QTOT1 peak. GXYLT2 acts on epidermal growth factor, which is expressed in human submandibular and parotid glands, and important for the maintenance of oroesophageal and gastric tissue.

Chromosome 5
The highest genome-wide quantitative trait LOD was observed for QTOT2. This QTOT2 peak contains the SLC12A2 (127.447-127.553 Mb) gene, whose protein product helps the movement of chloride ions in saliva, thereby assisting in salivary function. Also  Comparison with previous relevant signals Table 5 shows regions reported by previous studies, where our current LOD score is 1.0 or greater. Two regions were found to contain genes reported in prior studies.
Chromosome 11 A PRIM LOD of 1.23 was observed 8500 bp from the MPPED2 (30.338-30.558 Mb) gene. A suggestive association of primary teeth caries was reported by a previous study on 1305 children aged 3-12, some of whom are also part of this analysis (Shaffer et al., 2011). The phenotype was defined similarly to our PRIM phenotype.

Chromosome 19
A gene-set enrichment analysis study [38] reported an association of primary teeth caries to NLRP12 (58.989-59.019 Mb) in 1142 children aged 3-13, a subset of whom are also included in our study. QTOT1 and QTOT2 LOD scores ≥1.0 were observed 0.4-1.2 Mb from this gene.

Sensitivity analysis
For each of the two quantitative traits, QTOT1 and QTOT2, Fig. 3, panels A, B, C, and D show the percentage deviation of LODs obtained using HPs of 40% or 60% from baseline LODs produced with an HP of 50%. These deviations are plotted on the y-axis against the corresponding baseline LOD (x-axis). The red points indicate SNPs for which baseline LODs of 2.0 or greater dropped below 2.0 when the HP value was changed. Conversely, the green points show SNP positions with baseline LODs below 2.0, which subsequently switched to a score of 2.0 or more with a change in HP. Percentage deviations where the baseline LODs were between 0 and 0.05 are not presented in panels A through D. Within this range, the change in LOD combined across HP = 40% and HP = 60% ranges from − 0.04 to 0.11 LOD for QTOT1. For QTOT2, the corresponding range is − 0.05 to 0.1 LOD. Although in percentage terms they represent exponential changes as compared to the baseline, none of the deviations in the 0 to 0.5 baseline LOD score range result in the LOD score approaching significance. Panels E and F break down for each trait, the percentage of all SNPs that drop below-or exceed-the 2.0 LOD score threshold with a change in HP. For both traits, a decrease in HP to 40% results in a minimal percentage of SNPs changing status (be it an increase or decrease in LOD score). In contrast, SNPs with LOD scores of 2.0 or greater at HP 50% are more likely to drop below the 2.0 LOD threshold when the HP is increased to 60%. Table 6 presents the change in QTOT1 and QTOT2 LOD scores due to a change in HP for only the linkage peaks reported in Table 4. All LOD score peaks, except for one, remain above 2.0 despite changes in HP.

Discussion
To our knowledge, this study was the first to apply genome-wide multipoint linkage analysis to explore the genetic etiology of caries using densely spaced SNPs.
We defined two new quantitative phenotypes which combine childhood and adulthood caries indices while also accounting for variability by age. The linkage findings in this study nominated genes on six chromosomes (1, 2, 3, 5, 6, and 19) with potential involvement in caries   Note: a These values are the peak LOD score results reported in Table 4 for QTOT1 and QTOT2 COHRA, IHS, DRDR, IFS). As mentioned previously, linkage and association are complementary strategies for gene-discovery. In linkage, similarities and differences in pairs of phenotypes are modeled in terms of genetic similarity over related pairs from families. In association, this modeling is performed at the level of individuals. Our linkage uses multi-point analysis, i.e., the LOD score at any specific location is influenced by linkage at neighboring loci. Association generally uses a set of independent one-locus tests. Finally, as described in methods, this study differs from prior published work, both in the number, and the type (in terms of family composition) of individuals included in the analysis. Linkage utilizes family data and all related pairs (affected or phenotyped) within a pedigree whereas association generally is conducted on unrelated cases and controls, or at most parent-offspring trios. The genotyping panel was designed for association analyses, and therefore, is far denser than a linkage SNP panel. Although dense bi-allelic SNP panels may allow extraction of more information, a concern for this study was existing linkage disequilibrium between SNPs. We pruned SNPs based on marker-to-marker LD, and then exploited any remaining LD among the pruned set to create clusters which served as polymorphic markers. Despite the pruning and clustering, our analysis was conducted on a much denser set of markers (35,016 SNPs and 20,634 super-markers) compared to a typical linkage panel with 6000 SNPs. The use of multi-allelic super-markers also had the potential of increasing power of linkage studies in such a setting.
Genome-wide significance for each phenotype was empirically assessed through a series of simulations, which provides an approximation of the true underlying distribution of a statistic since not all features of the data can be completely replicated. In an exploratory study, adhering strictly to genome-wide significance thresholds may be overly conservative. Furthermore, of the 4727 subjects, only 2616 contributed to the linkage analysis, providing a comparatively small number of relative pairs given the large sample size.
The sensitivity analysis conducted for the parameter HP explores the impact of parameter value selection on a model-free QT method. The results from this analysis indicated that the non-parametric quantitative trait linkage method, as implemented in Merlin, was robust to variation in HP, and that changing the HP parameter had a minimal impact on LOD scores. Even more importantly, the linkage peaks were insensitive to parameter misspecification.
Environmental factors are not accounted for in this study due to unavailability of such data on many of our subjects, which would have drastically reduced the cohort size. We also did not attempt to analyze gene-by-gene interaction. The available methods for detection of gene-gene interaction that are applicable to our study design are computationally complex, thus making whole-genome interaction analysis beyond the scope of the current work (e.g. see the review of the various classes of interaction detection methods by Li [39]).

Conclusions
This study presents two new quantitative measures for dental caries which combine both the primary and permanent dentition, while adjusting for age effects. Genes identified in peak linkage regions underline the importance of exploring potential relationships between caries and other traits. We did not include environmental factors in this study. The interaction between putative caries risk conferring genes and factors including fluoride exposure, dietary habits, and the microbiome need to be investigated, as do interactions between the genes themselves. From a clinical perspective, individuals would be at an elevated lifetime risk of developing caries in both primary and permanent dentition, given increased genetic susceptibility. Understanding the genetic etiology of dental caries will allow health providers to design personalized treatment plans based on an individual's genetic risk of disease.

Additional file
Additional file 1: SNPs within support regions of reported peaks. Detailed results for all SNPs that lie within the support region for peaks with a LOD score of 2.0 or more, as summarized in Table 4  Mb: 10 6 (or 1 million) bp; NPL: Non-Parametric Linkage; PRIM: New binary dental caries phenotype based on dft; QT: Quantitative Trait; QTOT1: New age-adjusted quantitative caries phenotype based on the sum of the dft score and D 1 MFT score; QTOT2: New age-adjusted quantitative caries phenotype based on the sum of the dft score and D 2 MFT score; r 2 : A measure of LD based on the square of the correlation coefficient between loci); SNP: Single Nucleotide Polymorphism and genotype cleaning, as well as with general study coordination, was provided by the GENEVA Coordinating Center (U01-HG004446) and by NCBI. Data and samples were provided by (1)  Neither NIDCR nor the GENEVA Coordinating Center played a role in the design of the study and collection, analysis, and interpretation of data and in writing the manuscript.

Availability of data and materials
All genotype and phenotype data for the datasets analyzed during the current study are available on dbGaP repository, [https://www.ncbi.nlm.nih.gov/gap; accession number phs000095.v3.p1].