Outcome measures for oral health based on clinical assessments and claims data: feasibility evaluation in practice

Background It is well known that treatment variation exists in oral healthcare, but the consequences for oral health are unknown as the development of outcome measures is still in its infancy. The aim of this study was to identify and develop outcome measures for oral health and explore their performance using health insurance claims records and clinical data from general dental practices. Methods The Dutch healthcare insurance company Achmea collaborated with researchers, oral health experts, and general dental practitioners (GDPs) in a proof of practice study to test the feasibility of measures in general dental practices. A literature search identified previously described outcome measures for oral healthcare. Using a structured approach, identified measures were (i) prioritized, adjusted and added to after discussion and then (ii) tested for feasibility of data collection, their face validity and discriminative validity. Data sources were claims records from Achmea, clinical records from dental practices, and prospective, pre-determined clinical assessment data obtained during routine consultations. Results In total eight measures (four on dental caries, one on tooth wear, two on periodontal health, one on retreatment) were identified, prioritized and tested. The retreatment measure and three measures for dental caries were found promising as data collection was feasible, they had face validity and discriminative validity. Deployment of these measures demonstrated variation in clinical practices of GDPs. Feedback of this data to GDPs led to vivid discussions on best practices and quality of care. The measure ‘tooth wear’ was not considered sufficiently responsive; ‘changes in periodontal health score’ was considered a controversial measure. The available data for the measures ‘percentage of 18-year-olds with no tooth decay’ and ‘improvement in gingival bleeding index at reassessment’ was too limited to provide accurate estimates per dental practice. Conclusions The evaluated measures ‘time to first restoration’, ‘distribution of risk categories for dental caries’, ‘filled-and-missing score’ and ‘retreatment after restoration’, were considered valid and relevant measures and a proxy for oral health status. As such, they improve the transparency of oral health services delivery that can be related to oral health outcomes, and with time may serve to improve these oral health outcomes. Electronic supplementary material The online version of this article (doi:10.1186/s12903-017-0410-5) contains supplementary material, which is available to authorized users.


Background
The aim of oral healthcare is to maintain or improve oral health for individuals and populations [1]. The degree to which the likelihood of this goal is increased by the delivered oral healthcare services is regarded as quality of care [2]. Quality of care can therefore be explored using data which describes links between the healthcare services provided and improved oral health outcomes. These may comprise both clinical and non-clinical, patient-derived outcomes. Transparency on health services delivered may be a helpful tool in reducing unwarranted treatment variation [3] and is therefore of increasing interest in today's society [1].
Information on delivered health services may provide patients with expectations about their care, and can be used by insurance companies and other policy makers in assessing the balance between the outcomes and costs of care. In addition, it may provide general dental practitioners (GDPs) with insight into their clinical activities compared to their peers, which they could potentially use for reflection and, if required, for care quality improvement.
Use of clinical activity data with the aim of improving health outcomes is not the norm in oral healthcare [4]. Development of measures is limited by the few evidence based guidelines available, their variable implementation, and the lack of consensus on what is the best oral care for most conditions and situations [1,4,5]. It is therefore not surprising that variation in clinical practice has been described [1,4,6,7].
A number of initiatives on quality measures and indicators for oral health and healthcare service delivery are reported, mainly in Anglo-Saxon and Scandinavian countries [8][9][10][11][12][13][14][15][16]. They have compiled lists of performance measures. These lists show considerable overlap, while the topics included are often poorly defined and their development has often not been completed [14,17]. Only a few articles described the development of oral healthcare delivery topics into measures and reported on their performance [6,17,18]. Apart from Herndon et al. [6], the authors did not fully specify the measures, nor did they report data per dental practice, which hampers reproducibility and limits their usability. Worldwide, the development of quality measures for oral healthcare is still in its infancy [1,17].
Achmea, one of the major health insurance companies in the Netherlands with more than four million insured persons, therefore has explored the measurement of oral health outcomes with a group of researchers, advising experts and five general dental practices. The aim was to select and develop clinical outcome measures for oral health and provided oral health services that are considered relevant, valid and important in oral health, in the context of using clinical data from general dental practices and claims records of an insurance company. We used a structured approach to select and derive these measures and to evaluate them in practice by assessing the feasibility of data collection, face validity and discriminative validity.

Methods
Selection and development of measures MEDLINE-PubMed was searched (on Dec 3, 2012) to identify outcome measures for oral healthcare published in the previous 5 years. Titles and abstracts of papers identified in the search were scanned to select papers that potentially included outcome measures for oral health. In addition, we searched the databases of relevant Dutch oral healthcare organizations and international organizations involved in the production of quality measures for healthcare. The MEDLINE-PubMed search strategy and organizations are listed in Table 1.
Four oral healthcare experts were invited to participate in the project advisory board. These experts were identified by their work on quality of care topics in Dutch dental schools and professional associations for dentists. They assisted in the selection and definition of measures by prioritizing the identified measures and judging whether the list was comprehensive, and suggesting additional measures. Subsequently they selected measures on the following criteria: relevance for quality improvement for professionals in daily oral healthcare practice; reliability; validity; and limited administrative burden. The experts discussed their findings with Achmea project staff.
This process resulted in a set of four measures related to dental caries and its treatment, one measure on tooth wear, two on periodontal health and one measure on retreatment after restoration. The selected measures were then precisely defined, and methods of measurement and data collection were determined. The specifications are listed under the heading 'specifications of measure' in Tables 2, 3, 4, 5, 6, 7 and Table 8.

Dental caries
Time to first restoration The time to the first restoration is a proxy for the time of good oral health until the first, irreversible, invasive dental intervention. The longer this time interval, the more likely it is that effective preventive healthcare and preventive self-care is provided.
Distribution of risk categories for dental caries based on clinical assessment The clinically assessed risk categories were based on new carious lesions and progressed existing carious lesions. This included carious lesions with an indication for restorative treatment as well as active initial lesions without an indication for restorative treatment. The definition of the four risk categories (low, decreased, increased and high) are described in Table 3.
Distribution of risk categories for dental caries based on claims records Assessment of the dental caries risk categories using claims records was based on claimed restorations. The definition of the categories (low, decreased, increased, high) are described in Table 4.
Filled-and-missing (FM) score The FM-score is the average number of restorations plus extractions per child or adolescent per dental practice per year, a proxy for the increase in decayed, missing and filled permanent or primary teeth or both (ΔDMFT/Δdmft). Preservation of tooth material is an aspect of oral health and every restoration, re-restoration and extraction means loss of tooth material.
Percentage of 18-year-olds with no tooth decay This measure reflects the outcomes of preventive efforts from both patients and the oral healthcare team until adulthood.

Tooth wear
Percentage of patients per category for vertical (occlusal/ incisal) and horizontal (not occlusal/incisal) tooth wear. Tooth wear is irreversible, but the process can be stopped or slowed. Assessment, recording and monitoring is necessary to be able to intervene in an early stage before extensive restorative treatment is required.

Periodontal health
Changes in periodontal-health score Periodontal health is evaluated during the oral health examination using the Dutch Periodontal Screening Index (DPSI). DPSI is the Community Periodontal Index of Treatment Needs (CPITN), modified for the Netherlands. This measure shows the percentage of patients moving from DPSI category A (minor periodontal disease) or B (moderate periodontitis) to DPSI category C (severe periodontitis) and vice versa [19]. Prevention and treatment should aim on keeping patients out of the irreversible danger zone (DPSI category C). Improvement in gingival bleeding index at reassessment Percentage DPSI category B and C patients whose gingival bleeding index is lower at reassessment after initial periodontal treatment. Improvement of the DPSI-index is not always achievable in patients with periodontitis. DPSI-B or DPSI-C at reassessment indicates that there is at least one remaining site with a pocket, while most of the other pockets and inflammations may have been eliminated. The bleeding index is an additional measure to assess improvement in periodontal health.

Retreatment
Retreatment is the percentage of teeth with a rerestoration within 6, 12 and 18 months after restoration. It also measures the percentage of endodontically treated and extracted teeth within 6 months after restoration.
The measure provides information about the success rate and complications after restorative treatment in comparison with other dental practices. It can reflect incorrect indications, unforeseen complications, or failures of the restorations. Variation can also result from differences in clinical management of the same condition by different GDPs.

Participants
Five GDPs who expressed an interest in the quality of care improvement project in conversations with Achmea were invited to participate in the project. They all worked in group practices with several GDPs per practice with a range of two to seven full-time or part-time working GDPs. From the five participating GDPs, one did not participate in the data collection and another did not There was a difference in restoration-free seven-year-olds between the dental practices (p < 0.01, Chi-square test). Overall, there was a difference in restoration-free seven-year-olds between the socioeconomic groups in all dental practices (p < 0.01, Chi-square test); in the highest socioeconomic group there were more children with a restoration-free dentition and less in the lowest socioeconomic group.  For children and adolescents the distribution of risk categories for dental caries varied between the dental practices (p < 0.01, Kruskal-Wallis test). The caries risk was highest in dental practice #2 and lowest in dental practice #3. Overall, there was a difference in the distribution of risk categories in the different age groups (p < 0.01, Kruskal-Wallis test). Children in age group 0-6 years were in lower risk categories than children in age group 7-12 and adolescents in age group 13-17 years. There was a difference in distribution of risk categories in the different socioeconomic groups (p < 0.01, Kruskal-Wallis test). Children and adolescents in the lowest socioeconomic group were in higher risk categories than children and adolescents in the middle and low group. For adults the distribution of risk categories for dental caries also varied between the dental practices (p < 0.01, Kruskal-Wallis test). The caries risk was highest in dental practice #1 and lowest in dental practice #3. Overall, there was a difference in the distribution of risk categories in both age groups (p < 0.05 Mann-Whitney U-test). Adults in age group 40+ were in higher risk categories than adults in age group 18-39 years. There was a difference in distribution of risk categories in the different socioeconomic groups (p < 0.01, Kruskal-Wallis test). Adults in the highest socioeconomic group were in lower risk categories than adults in the middle and low group.

Specifications of measure
Data source Clinical assessment by the GDP during consultation participate in the discussion and interpretation of data. So, three GDPs participated throughout all project phases. The socioeconomic status was generally high in three of the four dental practices that participated in the data collection. The socioeconomic status of the patients of the other dental practice was mainly low. Patient characteristics from the participating GDPs are described per measure in Tables 2, 3, 4, 5, 6, 7 and 8.

Data sources
Data sources were claims data for provided care (fee per item), clinical records for information on oral health status or more detailed information on provided care, and assessment during consultations for not regularly recorded information on oral health. Since Achmea data only covers part of the patient population of GDPs, clinical records were also used in measures with limited numbers of patients. The clinically assessed measures only used data Table 3 'Distribution of risk categories for dental caries based on clinical assessment': patient characteristics, results, specifications (Continued)

Assessment of risk categories
Low -No restorations and no active caries lesions in the past Decreased -Restorations or lesions in the past, but not in the last year Increased -Active lesions and 1 new or progressed lesion in the last year High -Active lesions and 2 or more new and/or progressed lesions in the last year [26] Numerator/denominator Numerator: number of patients per risk category Denominator: total number of patients with an assessment of a risk category Inclusion criteria • at least one oral health examination a year • at least 2 years registered in the dental practice For children and adolescents as well as adults the distribution of risk categories for dental caries for all dental practices is not the same when based on clinical assessments and claims records (p < 0.01, respectively p < 0.05, Mann-Whitney U-test). Based on clinical assessments the caries risk is higher than based on claims records. However when for adults both lowest and both highest risk categories were merged the distribution is the same between both data sources (p > 0.05, Chi-square test).

Specifications of measure
Data source Claims records insurance company Assessment of risk categories Low -No known restorations in claims records insurance company in the last 3 years Decreased -No restorations in the last year, but 1 or more in the 2 years before last year Increased -1 restoration in the last year High -2 or more restorations in the last year (in different teeth) Modified from 'distribution of risk categories for dental caries based on clinical assessment' [26] Assessment based on • Restorations • Slicing and/or treatment of primary teeth • Stainless steel crowns primary teeth from patients examined by the participating GDPs; the measures based on patients or claims data could also comprise care provided by other GDPs working in the dental practices. Data on health insurance claims for selected measures were extracted from the Achmea records for the period January 2011 through December 2013. Data from the clinical records were extracted from the patient files of GDPs for the period January 2009 through December 2014. Clinical assessment data were collected by the GDPs from May through October 2014. Further data extraction or follow-up evaluation was neither planned nor conducted.
We did not attempt to match patients in the data sources; we were interested in the naturalistic patterns of delivered care as demonstrated by each of the data sources. However, we did assess case mix; data sources for case mix factors were the registered date of birth for age, the registered postal code for socioeconomic status, and data gathered during consultations about smoking status and diabetes.

Extracted data
Measures extracted from the claims records were 'distribution of risk categories for dental caries,  *) National is based on all the claims records from Achmea for dental practices with more than 50 young patients that are insured by Achmea.
The differences in FM-score between the dental practices were statistically significant (p < 0.01, Kruskal-Wallis test). Overall, differences in FM-score between the different socioeconomic groups were statistically significant (p < 0.01, Kruskal-Wallis test). The FM-score is lower for children and adolescents in the highest socioeconomic group compared to the middle and low groups. Differences in FM-score between age groups were not considered clinically relevant. Inclusion criteria • children and adolescents <18 years old • at least one oral health examination a year • at least 2 years registered in the dental practice • exclusion of extractions on behalf of orthodontic treatment (multiple extractions of premolars) and wisdom teeth categories for dental caries, based on clinical assessment' , 'tooth wear' , and information on smoking (yes/no) and diabetes (yes/no) on behalf of the measures 'changes in periodontal-health score' and 'improvement in gingival bleeding index at reassessment' were registered by the GDP during consultations. The specifications of the extracted data are listed in Tables 2, 3, 4, 5, 6, 7 and Table 8.
According to the Dutch law on (bio)medical research with humans an exempt from individual written or oral participant consent applied. The project was considered to be a healthcare improvement project in which patients were included in a manner that precludes reidentification. In accordance with the rules and regulations on privacy protection and data security in the collection and exchange of data, the data was collected in the participating dental practices by an independent and trusted third party and anonymized before being analyzed. According to the same rules and regulations the identity of participating GDPs was protected in the same manner.
Birth dates were used to categorize patients in age groups for children (0-6, 7-12 and 13-17 years of age) and for adults (18-   Differences between dental practices were visible, but they were mainly explained by differences in the way tooth wear was assessed. The GDPs experienced difficulties in the use of the assessment instrument and used it differently because the definition of the instrument was not clear enough. Therefore the results of the statistical tests are not reported.

Data analysis
The results for the measures were calculated per dental practice. The differences between dental practices were statistically tested except for the measure 'distribution of risk categories for dental caries based on claims records insurance company'. This measure was only compared to the distribution based on clinical assessments using the Mann-Whitney U-test and Chi-square test. Case mix factors were tested on aggregated results of all dental practices. 'Time to first restoration' was measured as the cumulative incidence per year of age and differences between the percentage restoration-free seven-year olds between the dental practices and case mix groups were tested with Chi-square tests. 'Distribution of risk categories for dental caries' , 'tooth wear' and 'changes in periodontal-health score' were measured in categories and calculated by dividing the number of patients per category by the total number of patients assessed for the measures concerned and expressed as percentages.
Comparisons between dental practices and case mix groups were made using Kruskal-Wallis tests or Mann-Whitney U-tests depending on the number of groups. The 'FM-score' was calculated by the formula [(number of children and adolescents with restorations/total number of children and adolescents with at least one oral health examination) x (number of restorations/number of children and adolescents with restorations)] + [(number of children and adolescents with extractions/number of children and adolescents with at least one oral health examination) x (number of extractions/number of children with extractions)]. The 'FM-score' was expressed as the average number of filled and missing teeth per patient per dental practice per year, and differences between dental practices and case mix groups were analyzed with the Kruskal-Wallis test. 'Retreatment after restoration' was calculated as the percentage of teeth with a re-restoration within six, 12 and 18 months, and the percentage of teeth with respectively an endodontic treatment and extraction within 6 months after restoration. The denominator was the number of teeth that were restored in 2012. Comparisons were only made between dental practices using Chi-square tests. The number of patients per case mix factor were too low for statistical tests per case mix group per dental practice; therefore we only tested case mix factors on overall results in order to detect differences in results between single case mix factors to study the influence of these factors in themselves. For the 'FM-score' we used the results of all patients insured by Achmea and for the other measures the results for all dental practices together.
We set the minimum number of patients to be analyzed per measure at 60; measures with less than 60 patients per dental practice were excluded. Missing data were not included in the analysis. SAS software (version 9.4) was used to calculate the results of the measures and the Kruskal-Wallis test for comparison of means of the FM-score, all other statistical tests were performed in SPSS Statistics (version 23).

Feasibility evaluation in practice
Variations in oral health in the participating dental practices as described by the measures were discussed with GDPs during individual feedback sessions with Achmea project staff, and in a plenary meeting with the participating GDPs and Achmea project staff. In two final meetings, Achmea project staff and experts evaluated and discussed the data, results, and feasibility.

Feasibility testing data collection
Feasibility is dependent on the availability of data, the time and work needed to retrieve the data, and the number of patients that can be included to make results comparable [17,21]. Retrieving data from the claims records of the insurance company required the least time and effort. Clinical assessment based on Vertical tooth wear: score per sextant the amount of lost tooth material on the occlusal/incisal surfaces [27] 0no (visible) tooth wear 1visible tooth wear only in enamel 2exposed dentin and loss of clinical crown height < 1/3 3loss of clinical crown height > 1/3, but <2/3 4loss of clinical crown height > 2/3 Horizontal tooth wear: score per sextant the amount of lost tooth material on the not occlusal/not incisal surfaces [27] 0no (visible) tooth wear 1visible tooth wear only in enamel 2exposed dentin Numerator/denominator To be defined. Due to the limited time of the study only baseline measures were conducted.
Inclusion criteria • at least one oral health examination a year • at least 2 years registered in the dental practice • permanent teeth However, complete datasets were only available for children up to 18 years old because oral healthcare for them is covered by legally obliged standard health insurance. Dental costs for adults are covered by voluntary supplementary insurance. The voluntary dental insurance scheme has a maximum reimbursement ceiling. Dental treatments above these maxima and dental treatments for uninsured adults are not recorded in insurance records and therefore supplementary data was assembled from the administrative databases in the dental practices. Additionally, some of the selected measures could only be assessed clinically. We evaluated the process of data collection in a meeting with the participating GDPs with use of a questionnaire. This questionnaire was developed for this study and added as a Additional file 1. Validity and reliability of the data were assessed during a meeting with the participating GDPs by showing the data and asking them whether the data seemed appropriate for their practice.

Face validity testing
Face validity refers to consensus between the experts that the measure represents the quality of the care being assessed [22]. We assessed face validity during the evaluation meetings with the participating GDPs and experts by answering the questions: 'Is the measure measuring an aspect of the delivered care?' , and 'Does the measure stimulate quality improvement?'.

Discriminative validity
Quality measures need to detect relevant differences between caregivers to discriminate between them [21]. This was tested by comparing the results of the participating dental practices with each other. We also tested overall results with case mix factors to detect disparities for subpopulations expected to have differences in oral health. People with a low socioeconomic status generally have poorer oral health, and age is also a critical factor [23]. Ageing is associated with increased incidence and severity of periodontal disease; smoking and diabetes are risk factors for periodontal disease [24].

Responsiveness
Quality measures need to detect changes over time within dental practices [21]. Due to the limited period of our study we were only able to perform baseline measurements. Responsiveness was therefore estimated and is not described in the feasibility-evaluation-in-practice section in the results.

Results
The literature search revealed 115 hits (Dec 3rd 2012). After scanning titles and abstracts eight potentially relevant articles remained. Two articles were excluded because the full text was not available. From the articles, databases and websites from organizations a total of 40 outcome measures were identified. Thirty-six relevant outcome measures on oral health and retreatment were supplemented with eight measures suggested by the experts and Achmea project staff. During prioritization and discussion several measures were excluded because they were duplications, considered less relevant, or occurred in rare cases only. Measures that relied on patient perceptions were excluded because this was beyond the scope of this project. Finally, eight measures were tested in this study. Figure 1 shows the flowchart for the selection of the outcome measures.
Measures for oral health status were divided into those concerning the teeth, notably dental caries (four) and tooth wear (one), and those concerning the periodontium, notably periodontal health (one) and gingival bleeding (one). The measures on oral health were supplemented with one measure for retreatment. Of the selected measures, 'tooth wear' and 'changes in periodontal-health score' did not meet the criteria for feasibility of data collection, face validity, discriminative validity and responsiveness; main reasons were that 'tooth wear' was not considered sufficiently responsive and 'changes in periodontal-health score' was considered a controversial measure. See Tables 7 and 8. During the data collection, we found that the available data for the selected measures 'percentage of 18-yearolds with no tooth decay' and 'improvement in gingival bleeding index at reassessment' was for less than 60 patients per dental practice and therefore were too limited to provide accurate estimates. These measures were excluded. Tables 2, 3, 4, 5 and 6 provide the patient characteristics and results of the statistical tests for the measures describing dental caries and retreatment.

Dental caries Time to first restoration
Using survival curves, we looked at the restoration-free cumulative incidence of seven-year-olds. There were differences between the dental practices (p < 0.01). On average 59% of the included children had a restorationfree primary dentition at the age of seven. The range Table 8 'Changes in periodontal-health score': patient characteristics, results, specifications (Continued) Denominator: total number of patients with a repeated assessment of the DPSI-score

Inclusion criteria
• at least one oral health examination a year • at least 2 years registered in the dental practice • ≥18 years old was 42% in dental practice #1 to 76% in dental practice #3. There was a difference (p < 0.01) in the proportion of restoration-free seven-year-olds between the different socioeconomic groups in all dental practices. In the highest socioeconomic group 67% of the seven-year-olds had a restoration-free dentition versus 45% in the lowest socioeconomic group.

Distribution of risk categories for dental caries based on clinical assessment
The distribution differed between dental practices (p < 0.01) for children and adolescents as well as for adults. For all dental practices, 45% of the children and adolescents were classified as low risk. However, in dental practice #2, 62% of the children and adolescents were classified as increased or high risk. For the adults in all dental practices 44% were classified as decreased risk, but in dental practice #3 this was 74%. Overall, younger age and a high socioeconomic status were associated with a lower risk for caries both in adults and in children and adolescents.

Distribution of risk categories dental caries based on claims records
For children and adolescents the patterns in distribution of risk categories was similar for both data sources in dental practice #1 and #3. The patterns in dental practice #2 differed substantially; based on clinical assessment 62% of children and adolescents were classified as high or increased risk, but based on claims records only 20% were at high or increased risk.

Filled-and-missing (FM) score
The FM-scores showed differences (p < 0.01) between the participating dental practices; the scores varied between 0.3 for dental practice #3 and 1.5 for dental practice #1. Overall, the score was 0.5 for age category 0-6 years, and 0.6 for age categories 7-12 years and 13-17 years old. The FM-score for children and adolescents with a high socioeconomic status was lower than for the other socioeconomic groups. In general, the socioeconomic status of the patients in dental practice #1 was lower than in the other dental practices, but when comparing the scores per socioeconomic group, the FMscore for dental practice #1 was found to be higher in all groups.

Retreatment
The percentage of re-restorations differed between the dental practices for children and adolescents (p < 0.01) as well as for adults (p < 0.01). The number of rerestorations in dental practice #3 was considerably higher than in the other dental practices; within 18 months this was 15% for children and adolescents in dental practice #3 (13 out of 88 restorations) versus 0% and 1% in the other dental practices. For adults rerestoration within 18 months was 7% in dental practice #3 (140 out of 2007 restorations), whilst in the other dental practices this was lower than 2%. None of the restored teeth were endodontically treated or extracted within 6 months after restoration in children and adolescents, and there were very few in adults. Table 9 summarizes the findings for feasibility of data collection, face validity, discriminative validity and estimated responsiveness for all tested measures.

Feasibility of data collection
Dental caries Due to the small numbers per birth cohort for the measure 'time to first restoration' the data for several birth cohorts were combined. Data collection for the 'FM-score' and 'distribution of risk categories for dental caries based on claims records was feasible for children and adolescents because full oral healthcare services are covered by mandatory insurance. For adults the data for 'distribution of risk categories for dental caries' was incomplete because of the limited reimbursements in the additional, optional health insurance. For 'Distribution of risk categories for dental caries based on clinical assessment' the number of children and adolescents per dental practice were low within the measurement period. The assessment instrument was used in different ways by the GDPs; GDP #2 counted every initial lesion, while the other dental practices only included lesions that needed restoration. There were also differences in the use of diagnostic methods as loupes, professional cleaning before examination and bitewing radiographs, which leads to higher detection rates of caries lesions. GDP #2 did not agree that the pattern of his data were representative for this measure, while the other GDPs did agree.
Retreatment An advantage of the extraction of data on retreatment from the patient files is the addition of surfaces to tooth numbers. However, the surfaces were not recorded in a standard way and sometimes more indepth codes were added. Therefore comparisons of the surfaces retreated were not feasible, and retreatment was calculated per tooth and not per restoration. GDP#3 did not think the patterns of his data were representative and found numerous reasons after checking them. Some restorations failed in teeth with an indication for extraction. Most of the retreatments in children and adolescents were caused by caries in a different surface, by rebonding orthodontic retainers and retreatment of restorations placed in a bloody setting due to trauma. Most retreatments in adults were expected, reasons were: repairs in tooth wear patients with extensive dental rehabilitations, restoration before and after endodontic treatment and stepwise excavation and restoration procedures.

Face validity
Dental caries The findings for 'FM-score' , 'time to first restoration' and 'distribution of risk categories for dental caries based on claims records insurance company' were influenced by the treatment indications for restorative interventions from the GDP and the used diagnostic methods. Feedback information on these measures enabled GDPs to mutually discuss their treatment approach and best practices for preventive actions, to potentially stimulate quality improvement. Risk assessments were considered helpful in the development of a structured treatment approach per risk category. The measures were viewed as providing useful feedback on the degree of success in preventing and treating caries.
Retreatment There was a lot of discussion on the interpretation of this measure. Scores are influenced by patient characteristics such as bruxism or self-care, and by the treatment approach of the GDP. Some GDPs excavated stepwise to prevent endodontic treatment or monitored a carious lesion in a different surface of the tooth, whereas others chose to treat the whole tooth at once. Comparisons of the outcomes over time could potentially provide a better basis to evaluate treatment approaches, restoration materials and restoration techniques. Unexpected outcomes stimulated GDP#3 to search for the cause of these outcomes and to compare treatment approaches to other GDPs.

Discriminative validity
Dental caries The 'FM-score' showed considerable treatment variation between the dental practices. Due to the high caries risk of their population, dental practice #1 filled carious lesions at an earlier stage of the caries process than the other dental practices in the expectation of preventing endodontic treatments. The differences in 'distribution of risk categories for dental caries based on clinical assessment' were at least partially caused by different use of diagnostic methods and assessment criteria. The dental practices in our study had apparent differences in their case mix, but it was not possible to correct for case mix factors given the low numbers per case mix factor subgroup.
Retreatment The percentage of re-restorations within 18 months after restoration showed discriminative validity. But, there was no discriminative validity for endodontic treatments and extractions within 6 months after restoration; these retreatments were rare.

Discussion
We selected eight measures on oral health for testing in practice. Four measures were evaluated as feasible and considered to be relevant measures as proxy for oral health status. These were 'time to first restoration' , 'distribution of risk categories for dental caries' , 'filled-andmissing score' and 'retreatment after restoration'. The measure 'tooth wear' was not considered sufficiently responsive and 'changes in periodontal health score' was considered a controversial measure. The measures 'percentage of 18-year olds with no tooth decay' and 'improvement in gingival bleeding index at reassessment' were not feasible due to the limited numbers of patients for these measures per dental practice. The feasible measures on dental caries and retreatment were judged as having face validity and discriminative validity. Information on these measures provided relevant, valid and important feedback for GDPs with a potential to improve quality in these aspects of oral healthcare. Fine tuning of the measures, testing over a longer period in a larger number of dental practices, psychometric reproducibility, multivariate testing and further research on the responsiveness are necessary. Further research is also required on the relationship between outcomes, patient populations and treatment approaches of GDPs before they can be considered as measures that truly describe improved health outcomes as a direct result of the oral healthcare provided.

Considerations for feasible measures
The measures on dental caries and retreatment need some fine tuning.

Time to first restoration
Outcomes on dental caries may be biased by variation in the treatment approach of the primary dentition. In the Netherlands substantial differences in treatment approaches for caries in the primary dentition exist [1]; these vary from restoring every cavity to non-operative methods like removing undermined enamel combined with instructions to improve dental hygiene. Separate measures are needed for the primary and permanent dentition for a better understanding of the variation in the provision of oral healthcare.

Distribution of risk categories
Originally this measure was 'changes in distribution of risk categories for dental caries'; therefore this required a second measurement after, say, 1 year which was not feasible as part of this study. The optimal period to reassess the caries risk was the subject of discussion. In the study of Harris et al. a period of 1 year was responsive, but GDPs suggested to take a longer time than a year, because transition to a lower risk category may take more time [18]. Clinical assessment of risk categories requires clear instructions for use of the assessment instrument. The use of a caries classification system is necessary to monitor development and progression of caries lesions and detect changes over time. Feasibility would potentially improve if automatic risk categorization could be built into the software systems for dental practices. Harris et al. based the assessment of risk categories on the social history, medical history and clinical examination on dental caries and periodontal health. Despite the broad measure they concluded that the distribution of patients in risk categories per practice was one of the most useful piloted measures [18].

FM-score
Separate scores are needed for the primary and permanent dentition. Discussions with practitioners are required in addition to the data; insight on GDPs' restoration thresholds for carious lesions and the treatment approaches for the primary dentition are needed to be able to interpret the results.

Retreatment
If retreatment is only measured per tooth the administrative burden can be lowered by using claims records from the insurance company. Since only 'retreatment after restoration' showed discriminative validity, the measure 'retreatment' can be changed to 're-restoration of teeth'. The Australian Council on Healthcare Standards [8] used the measure 'restorative treatmentteeth retreated within 6 months'; based on our results for adults, this period may be too short.
'Time to first restoration' is a proxy for the period of good oral health. The 'FM-score' shows a yearly cross section of the average number of restorations and extractions, and 'changes in distribution of risk categories for dental caries' provides results for a cohort. The differences between clinical assessed distributions of risk categories for dental caries and the other measures for dental caries can indicate the effects of prevention. The higher caries risk based on claims records in dental practice #3 can be explained by the high number of rerestorations, and should be taken into account in the interpretation of measures based on claims records.

Other measures
'Tooth wear' , and 'changes in periodontal-health score' did not pass the feasibility evaluation in their current form. It is not common practice to measure and record 'tooth wear' in daily practice and it appeared that this measure had very limited effects on the actions of the GDP. Although measuring and recording of the Dutch Periodontal Screening Index (DPSI-index) is a mandatory part of a routine oral examination in the Netherlands [19], data available from the clinical records was limited. In particular, data were not recorded when there were no changes, or when patients were treated for periodontal diseases. This is consistent with the findings of an earlier study in the Netherlands in 2012 where only 49% of GDPs measured and recorded the DPSI [12]. There was considerable discussion about the meaning of this measure and the value of the DPSI-score for assessment of the severity of periodontal disease. However, the DPSI is the only accepted measuring instrument in the Netherlands to assess periodontal health and treatment need. Further clarification and agreement of the professional approach is required to determine the meaning and eligibility of the DPSI-score as a measure for oral health status.

Quality of care
This study is a starting point for the creation of transparency in the provision of oral healthcare services related to oral health outcomes. We found treatment variation between dental practices for some relevant measures. This probably means that not every patient receives optimal oral healthcare [1]. More clinical guidelines, widely accepted by the profession, are needed; these may then provide a basis for valid and reliable process measures. When apparently unwarranted variation is found in the presence of endorsed professional clinical guidelines or when evidence-based treatment choices are not clear, peer group discussions may be helpful in promoting healthcare quality improvement [3]. Moreover, measures on oral health outcomes as we have described can provide a focus for such discussions. We found the clinical assessments, feedback and discussions on the presented measures in itself were seen as beneficial by the GDPs. In addition, the presented measures stimulated the GDPs to evaluate their treatment approaches and reflect on their outcomes, especially when the results were not what they expected. This could potentially lead to changes in clinical practice, preventing or slowing the need for future restorations and extractions.

Barriers and recommendations
Our approach is potentially generalizable to all countries where patients regularly visit the same dental practice. Clinical assessments during consultations are possible everywhere and clinical records, after adjusting the recording system if necessary, can be used for data extraction in situations with funding systems other than fee per item.
For discussions on quality of care it is necessary that data for measures are available. It appears there is scope for improvement in the clinical records of the dental practices [25]. Record keeping sometimes serves more for financial administration than a full clinical record to support ongoing clinical care. Unambiguous definitions and assessment methods for measures also need to be addressed [12].
There is an opportunity for software suppliers to build standard formats for simple recording of data in the IT systems used by dental practices. Data extraction from clinical records is very time consuming [12] and, depending on the health care system, claims records of an insurance company may only provide information for part of the patient population of the dental practice. A national system which automatically records a standardized dataset including standardized diagnostic codes from dental practices, would potentially provide for more accurate information on oral health treatments and outcomes [4,6,17]. Such a system could form the basis of a valuable feedback system for GDPs, a basis for scientific data for further research on quality of care, information for patients, and information for policy makers on oral health outcomes.
For discussions on quality of care, an open and safe environment created by trained educational facilitators, and an intention to learn are important conditions [7]. If GDPs are asked to provide data for external evaluation of the quality of their care, there might be a risk of patient selection and bias. As such, we did not intend to use measures for normative purposes; our intention was to inform and help GDPs in reflections and discussions about their oral healthcare service deliveries.

Conclusions
The evaluated measures 'time to first restoration' , 'distribution of risk categories for dental caries' , 'filled-andmissing score' and 'retreatment after restoration' , were considered valid and relevant measures and a proxy for oral health status. As such, they improve the transparency of oral health services delivery that can be related to oral health outcomes, and may serve to improve these oral health outcomes after further development. As yet, these measures may inform discussions on quality of oral healthcare.